p2p-based multidimensional indexing methods: a survey

15
The Journal of Systems and Software 84 (2011) 2348–2362 Contents lists available at ScienceDirect The Journal of Systems and Software j ourna l ho me page: www.elsevier.com/locate/jss P2P-based multidimensional indexing methods: A survey Chong Zhang , Weidong Xiao, Daquan Tang, Jiuyang Tang C 4 ISR Technology National Defense Science and Technology Key LAB, National University of Defense Technology, Changsha 410073, PR China a r t i c l e i n f o Article history: Received 26 August 2010 Received in revised form 4 July 2011 Accepted 6 July 2011 Available online 20 July 2011 Keywords: Multidimensional index Peer-to-peer computing Survey a b s t r a c t P2P-based multidimensional index (MI) is a hotspot which absorbs many researchers to dedicate them into. However, no summarization or review on this technology has been made at present. To the best of our knowledge, this is the first work on reviewing P2P-based MI. This paper innovatively adopts visualization technique to show the research groups and then analyzes investigating style of research groups. Based on evolution of P2P-based MI inheriting from centralized MI and P2P, we divide P2P-based MI methods into 4 categories: extending centralized MI, extending P2P, combining centralized MI and P2P, and miscellaneous. For each category, the paper selects classical techniques and describes them in detail. This is the first time of doing the classification job over massive related works. Finally, load balancing and update strategies are described and discussed for they are important factors related to performance. We believe many researchers will get benefits from our work for further studies. © 2011 Elsevier Inc. All rights reserved. 1. Introduction Dealing with querying multidimensional data problems has attracted a lot of attention in the database community over the last decades (Gaede and Gunther, 1998; Bertino and Ooi, 1999). Effectively managing multidimensional data is very useful in many fields, such as geosciences, CAD, robotics, and environmental pro- tection. Multidimensional indexing (MI) method is always a key technique to improve the querying efficiency on these data, and also is a challenging issue which interests many researchers to devote themselves into (Bertino et al., 1997; Samet, 1990a,b). Recent years, the researching hotspot on multidimensional indexing has been transferred from centralized paradigm to decentralized one, espe- cially focused on P2P based MI methods (Mondal et al., 2004; Cai et al., 2003; Bharambe et al., 2004; Tanin et al., 2007). Since the middle period of the last century to nowadays, MI technique has been developed rapidly. Gaede and Gunther (1998) elaborately reviewed MI methods. They summarized and cate- gorized MI’s applications, features and technical details. Simply speaking, MI aims to accelerate the querying efficiency on mul- tidimensional data. The basic idea behind MI is to organize the underlying data from a global view, and all the dimensionality should be considered synthetically. Better indexing structures and searching algorithms are preferred to improve the querying pro- Corresponding author. Tel.: +86 134 6906 7028. E-mail addresses: [email protected] (C. Zhang), [email protected] (W. Xiao), [email protected] (D. Tang), jiuyang [email protected] (J. Tang). cedure. The nature function of MI is pruning–cutting away a lot of useless searching paths. Peer-to-Peer computing (P2P) (Lua et al., 2005; Milojicic and Kalogeraki, 2002) emerges as a whole new paradigm over the last decades. It emphasizes that participating peers are independent and autonomous, and they cooperatively accomplish the computa- tions and form a self-organized and adaptive sharing community. We analyze the motivation of P2P based MI method as follows (Fig. 1): (1) Centralized MI methods need to be decentralized to achieve high scalability (Mondal et al., 2004). Because more and more multidimensional data are being used, “single point” MI cannot scale well when the amount of load is highly increased. Bad performance, single-point-failure and bottle neck will cause the whole system running inefficiently. P2P is one of the best solutions to solve the problems brought by centralized mode of MI. (2) P2P systems need to be equipped with multidimensional com- plex query processing capabilities (Cai et al., 2003; Bharambe et al., 2004; Tanin et al., 2007). In the area of P2P community, more and more users tend to issue complex queries to find objects which match the requirements on multidimensional perspective. For instances, in P2P multi-player games (Lee et al., 2004), P2P job-search networks (Tanin et al., 2007) or P2P auc- tions, people usually want to find all other players in a specified area (2-dimesional geometry or 3-dimensional one), all jobs which are suitable for multidimensional requirements, or the most satisfying product of both qualified and cheap. Tradi- tional P2P techniques are able to provide exact-match querying capability but poor mechanism for multidimensional (or multi- 0164-1212/$ see front matter © 2011 Elsevier Inc. All rights reserved. doi:10.1016/j.jss.2011.07.027

Upload: chong-zhang

Post on 05-Sep-2016

237 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: P2P-based multidimensional indexing methods: A survey

P

CC

a

ARRAA

KMPS

1

alEfittitttce

tegstuss

((

0d

The Journal of Systems and Software 84 (2011) 2348– 2362

Contents lists available at ScienceDirect

The Journal of Systems and Software

j ourna l ho me page: www.elsev ier .com/ locate / j ss

2P-based multidimensional indexing methods: A survey

hong Zhang ∗, Weidong Xiao, Daquan Tang, Jiuyang Tang4ISR Technology National Defense Science and Technology Key LAB, National University of Defense Technology, Changsha 410073, PR China

r t i c l e i n f o

rticle history:eceived 26 August 2010eceived in revised form 4 July 2011ccepted 6 July 2011vailable online 20 July 2011

a b s t r a c t

P2P-based multidimensional index (MI) is a hotspot which absorbs many researchers to dedicate theminto. However, no summarization or review on this technology has been made at present. To the bestof our knowledge, this is the first work on reviewing P2P-based MI. This paper innovatively adoptsvisualization technique to show the research groups and then analyzes investigating style of research

eywords:ultidimensional index

eer-to-peer computingurvey

groups. Based on evolution of P2P-based MI inheriting from centralized MI and P2P, we divide P2P-basedMI methods into 4 categories: extending centralized MI, extending P2P, combining centralized MI andP2P, and miscellaneous. For each category, the paper selects classical techniques and describes themin detail. This is the first time of doing the classification job over massive related works. Finally, loadbalancing and update strategies are described and discussed for they are important factors related toperformance. We believe many researchers will get benefits from our work for further studies.

. Introduction

Dealing with querying multidimensional data problems hasttracted a lot of attention in the database community over theast decades (Gaede and Gunther, 1998; Bertino and Ooi, 1999).ffectively managing multidimensional data is very useful in manyelds, such as geosciences, CAD, robotics, and environmental pro-ection. Multidimensional indexing (MI) method is always a keyechnique to improve the querying efficiency on these data, and alsos a challenging issue which interests many researchers to devotehemselves into (Bertino et al., 1997; Samet, 1990a,b). Recent years,he researching hotspot on multidimensional indexing has beenransferred from centralized paradigm to decentralized one, espe-ially focused on P2P based MI methods (Mondal et al., 2004; Cait al., 2003; Bharambe et al., 2004; Tanin et al., 2007).

Since the middle period of the last century to nowadays, MIechnique has been developed rapidly. Gaede and Gunther (1998)laborately reviewed MI methods. They summarized and cate-orized MI’s applications, features and technical details. Simplypeaking, MI aims to accelerate the querying efficiency on mul-idimensional data. The basic idea behind MI is to organize thenderlying data from a global view, and all the dimensionality

hould be considered synthetically. Better indexing structures andearching algorithms are preferred to improve the querying pro-

∗ Corresponding author. Tel.: +86 134 6906 7028.E-mail addresses: [email protected]

C. Zhang), [email protected] (W. Xiao), [email protected]. Tang), jiuyang [email protected] (J. Tang).

164-1212/$ – see front matter © 2011 Elsevier Inc. All rights reserved.oi:10.1016/j.jss.2011.07.027

© 2011 Elsevier Inc. All rights reserved.

cedure. The nature function of MI is pruning–cutting away a lot ofuseless searching paths.

Peer-to-Peer computing (P2P) (Lua et al., 2005; Milojicic andKalogeraki, 2002) emerges as a whole new paradigm over the lastdecades. It emphasizes that participating peers are independentand autonomous, and they cooperatively accomplish the computa-tions and form a self-organized and adaptive sharing community.We analyze the motivation of P2P based MI method as follows(Fig. 1):

(1) Centralized MI methods need to be decentralized to achievehigh scalability (Mondal et al., 2004). Because more and moremultidimensional data are being used, “single point” MI cannotscale well when the amount of load is highly increased. Badperformance, single-point-failure and bottle neck will causethe whole system running inefficiently. P2P is one of the bestsolutions to solve the problems brought by centralized mode ofMI.

(2) P2P systems need to be equipped with multidimensional com-plex query processing capabilities (Cai et al., 2003; Bharambeet al., 2004; Tanin et al., 2007). In the area of P2P community,more and more users tend to issue complex queries to findobjects which match the requirements on multidimensionalperspective. For instances, in P2P multi-player games (Lee et al.,2004), P2P job-search networks (Tanin et al., 2007) or P2P auc-tions, people usually want to find all other players in a specifiedarea (2-dimesional geometry or 3-dimensional one), all jobs

which are suitable for multidimensional requirements, or themost satisfying product of both qualified and cheap. Tradi-tional P2P techniques are able to provide exact-match queryingcapability but poor mechanism for multidimensional (or multi-
Page 2: P2P-based multidimensional indexing methods: A survey

C. Zhang et al. / The Journal of Systems and Software 84 (2011) 2348– 2362 2349

Database field

Distributedcomputing field

Distributeddatabase

CentralizedMI

P2P-base

Peer-to-Peercomputing

For scalabilityand high

performance

For the capability ofmultidimensional

complex queryprocessing

Fig. 1. The motivation of P2P-based MI. Centralized

Table 1Comparison of centralized MI and P2P-based MI.

MI in central mode MI in P2P mode

Node A section of memory ordisk

An autonomous computer(a peer)

Pruning Reducing I/O times Reducing communicationcost

How to prune Partitioning the space,designing suitable indexing

Partitioning the space,designing scalable overlay

biPTi

kbtaf

(

(

(

qtSgst

P2P-based MI inherits basic idea of traditional central indexing

structure and centralizedsearching algorithm

network and distributedsearching algorithm

attribute) range search and similarity search. So, P2P should beequipped with MI components.

Although P2P based MI is a kind of distributed computing, theasic principle behind the technique is still pruning. In central-

zed mode, pruning almost means reducing I/O bound, while in2P mode, pruning mostly means reducing communication cost.able 1 shows the comparison of MI in centralized mode and thatn P2P mode.

This paper reviews the P2P-based MI methods. To the best of ournowledge, this is the first work to summarize and review the P2P-ased MI methods. Besides, in this paper, we apply a new surveyo our survey to make readers more intuitionistic to the state-of-rt – visualization. The main contribution of this paper is listed asollows:

1) For the first time, we make a survey of P2P-based MI methodswhich are new for the database community and P2P commu-nity.

2) We adopt a visualization method in this survey to show theresearching groups and analyze the researching style of groups.We believe this is a new concept compared to other field’s sur-veys.

3) We make a summary of load balancing algorithms and updatestrategies in the field of P2P-based MI, which are 2 impor-tant factors related to system performance. Besides, this willbe helpful for designing optimal indexing structures.

The rest of this paper is organized as follow: Section 2 describesuerying types in P2P-based MI and makes a classification. In Sec-ion 3, we summarize the evolution of P2P-based MI methods.ection 4 gives our new concept – visualizing the researching

roups and shows our analysis on researching style. Section 5ummarizes and compares the P2P-based MI methods. From Sec-ions 6–9, we describe every structure in detail by classification.

d MI

MI and P2P try to develop towards each other.

In Section 10, we summarize load balance and update strategies inP2P-based MI methods. We draw a conclusion in Section 11.

2. Query types and classification

2.1. Query types

Gaede and Gunther (1998) listed 9 kinds of multidimensionalquerying types. We have scanned state-of-the-art for P2P-basedMI methods, and the following query types are usually discussed:

(1) Window Query (WQ): Given a d-dimensional interval Id = [l1,u1] × [l2, u2] × · · · × [ld, ud], return all objects o intersecting withId in all peers: WQ (Id) = {o|Id ∩ o /= �}

(2) Range Query (RQ): This kind of query is usually used in similar-ity search, given a query object q and the range ı, from all peers,return all objects o, from which to q the distance is not morethan ı, RQ (q, ı) = {o |||o − q|| ≤ ı}, here ||.|| means a similaritydistance.

(3) k Nearest Neighbor Query (kNNQ): Given a query object q andthe integer k, from all peers, return the k-ary (o1, o2, . . ., ok), inwhich oi is ith nearest neighbor of q:

2.2. Classifications

P2P-based MI methods can be classified by different criteria.

(1) From the view of evolution, P2P-based MI methods can beclassified into extending centralized MI, extending P2P, com-bining centralized MI, P2P, etc. This kind of classification willbe explained in Section 3;

(2) From the view of application, P2P-based MI methods can beclassified into P2P-based spatial index (especially for GIS),P2P-based multi-attribute index (usually for database), andP2P-based CBIR-oriented index (usually for multimedia appli-cations);

(3) From the view of query types, P2P-based MI can be classifiedinto P2P-based spatial (or multi-attribute) range search index,and P2P-based similarity search (including RQ and kNNQ)index.

3. Deriving and evolutions

technique or P2P technique or both of them. We illustrate the evo-lution map of P2P-based MI in Fig. 2 from which some knowledgecan be found:

Page 3: P2P-based multidimensional indexing methods: A survey

2350 C. Zhang et al. / The Journal of Systems and Software 84 (2011) 2348– 2362

map o

(

(

Fig. 2. Evolution

1) Inheriting tree-like centralized indexing method: most cen-tralized indexes are organized into trees (Samet, 1990a,b),because this kind of structure can be very efficient in pruningthe search paths. For example, binary-tree, B-tree and R-tree(Guttman, 1984) are very useful for memory-based search-ing, disk-based 1-dimensional searching and disk-based spatialsearching, respectively. They organize data hierarchically. R-tree leverages the basic idea of B-tree in multidimensionalspaces. In the R-tree’s leaf nodes, items point to the real spa-tial data, while in the internal nodes, items point to the rootsof their sub-trees and store spatial summaries for navigatingthe search. For a multidimensional query, it is firstly deliveredto the root which directs the query to the sub-trees that possi-bly contain the results based on whether the spatial summariesin the items intersect with the query. As root does, the inter-nal nodes which receive the query will do the same procedureuntil the leaf nodes are located. P2P-based MIs which inherittree-like central indexes organize the peers into a tree struc-ture. Every peer maintains a partial index which functions likerouting table. This kind of deriving is a transform from central-ized mode to P2P mode, and new overlay network protocolsand algorithms are created.

2) Inheriting P2P method: P2P can be approximately classifiedinto structured P2P and unstructured P2P (Lua et al., 2005;Milojicic and Kalogeraki, 2002). In the former, peers are orga-nized into structured overlays, like ring (Stoica et al., 2001),tree (Jagadosj et al., 2005), hypercube, etc. Peers must con-stantly maintain such an overlay by communicating with eachother cooperatively. DHT (Stoica et al., 2001; Ratnasamy et al.,2001; Rowstron and Druschel, 2001; Zhao et al., 2004) is one ofthe most important structured overlays in which each peer orresource is assigned a unique key, and with these keys peers can

efficiently store and lookup resource through the distributedhash table maintained by peers. Unstructured P2P does not laya strict overlay network on peers. Peers just connect to oth-ers peers at random, so maintaining overhead is lower than

f P2P-based MI.

structured P2P, however, to lookup a resource, peers oftenpay much higher cost than structured P2P because flooding isoften adopted and searching becomes blind. Neither of themis capable to handle the multidimensional complex queryingefficiently. To solve the problem, multidimensional query pro-cessing algorithms are introduced into the P2P community toextend P2P to support multidimensional query.

(3) Inheriting both central multidimensional indexes and P2P:traditional multidimensional indexes are capable to handlemultidimensional space partitioning and P2P has the abilityto organize peers in distributed fashion to efficiently delivermessages to destinations, so combining these two advantageswill bring more benefits for distributed multidimensional queryprocessing.

4. Research groups and investigating style analysis

Many researchers have devoted much effort into P2P-base MI,and they formed different research groups according to their aca-demic ideas and styles. This paper wants to explore the relationshipamong the authors and the contributions. Further more, we cansee some certain investigating styles within a research group. Thisis done by visualizing the relationship among authors, co-authorsand contributions (Prefuse, 2010).

We visualize the whole state-of-the-art in Fig. 3. Contributionsand authors are marked with different colors. We can see there are3 bigger clusters representing 3 different research groups.

According to the number of edges for author nodes connectingto contribution nodes, in each graph, there are some representativeresearchers in each research group (see Table 2).

(1) Research group A (Fig. 4). Representative researcher: Ooi

(5 contributions), Jagadish (4 contributions). The main con-tribution of group A is about enhancing the efficiency ofmultidimensional query processing. Ooi B C often adopts dis-tributed tree structure to process multidimensional query, e.g.,
Page 4: P2P-based multidimensional indexing methods: A survey

C. Zhang et al. / The Journal of Systems and Software 84 (2011) 2348– 2362 2351

Fig. 3. Visualization of research groups.

Table 2Analysis of research groups and investigating styles.

Researchgroups

Representatives Research organizations Investigating style

Group A B.C. Ooi, H.V.Jagadish, K.-L.Tan, A. Zhou

National University ofSingapore, Universityof Michigan, FudanUniversity

Often using tree-liketopology and spacefilling curves tomapping objects fromhigh-dimensionalspace to 1D space

Group B P. Zezula, C.Gennaro

Masaryk University(Czech Republic)

Using DHT todecentralizecentralized similarityindexes

Table 3Specifications for comparison.

Specification Explanation

Query type Which type of query to supportSpace partition How to partition the whole multidimensional

spaceTopology How the underlying peers connect to each other, in

which kind of overlay or shapeData type Which kind of multidimensional data to support,

point data or spatial data (the data with spatialextent)

Lookup method How to search in the indexLookup performance Measurement of lookup latency in indexPeer join/leave Measurement of cost for peer join/leave the index

Group C W.-C. Lee Pennsylvania StateUniversity

Using tree-liketopology

BATON (Jagadosj et al., 2005), BATON* (Jagadish et al., 2006a)and VBI-tree (Jagadish et al., 2006b). Besides, Space Filling

Curves (Faloutsos and Roseman, 1989; Orenstein and Merrett,1984; Faloutsos, 1986; Jagadish, 1990) are often used as a utilityto divide the whole space into equal cells, which are encodedlater, e.g., Z-net (Shu et al., 2005).

Data insertion/deletion Measurement of cost for inserting/deleting dataLoad balancing Load balancing methodUpdate strategy Strategy for index update to enhance scalability

(2) Research group B (Fig. 5). Representative researcher: Zezula.The main contribution of this group is to study the similaritysearch in P2P networks. Zezula combined iDistance (Jagadish

et al., 2005) (a centralized index for similarity searching) andDHTs (such as Chord and CAN) to invent new structures (Falchiet al., 2005; Novak and Zezula, 2006).
Page 5: P2P-based multidimensional indexing methods: A survey

2352 C. Zhang et al. / The Journal of Systems an

(

5

tbu

causes overflow, a split should be carried by super-peer. NR-tree

Fig. 4. Visualization of group A.

3) Research group C (Fig. 6). Representative researcher: Lee. Thisgroup is similar to group A, and it often adopts distributed treestructure to index multidimensional data.

. Summary and comparison

In this section, we try to summarize P2P-based MI and compare

hem in form of table. To compare P2P-based MI methods, we com-ine traditional index measurements and P2P overlay features, andse the following specifications in Table 3.

Fig. 5. Visualizatio

d Software 84 (2011) 2348– 2362

We list some representative ones in Table 4, the word “none” inthe table means that the author did not mention the correspondingspecification in his/her literature.

6. Extending centralized MI

6.1. Extending R-tree

P2PR-tree (Mondal et al., 2004): This structure organizes peersinto hierarchical tree overlay, and each peer expresses their con-tained spatial data as peerMBR. First, the whole universe is staticallydivided into 4 blocks equally, and then each block is also staticallysubdivided into 4 groups equally. In each group, peers communi-cate with each other to form a node of the tree. If the number ofpeers in a group exceeds a threshold, that group must be split intotwo subgroups. Thus, a distributed tree is built hierarchically bysuch splitting. All peers are located at the leaf nodes, and each peermust maintain a path from itself to the root, which will provide nav-igating information during spatial search. When a query is issued toa peer, the peer should judge the approximate destination peer anddeliver the search message to next hop, the query is routed frombottom to root and then from the root to the destination. The mainflaw of P2PR-tree is that it is an unbalanced tree, so if data is skewed,some peers must maintain extremely long path information, whichwill lower searching performance.

NR-tree (Liu et al., 2005): NR-tree is a distributed version of R*-tree (Beckman et al., 1990). Peers are classified into super-peers(Yang and Garcia-Molina, 2003) and passive-peers. One super-peermanages a certain number of passive-peers, and thus they forma cluster. If the number of passive-peers exceeds a threshold, thecluster should be split as CAN (Ratnasamy et al., 2001) does. Super-peers in different clusters form a CAN overlay to communicatewith each other. In each cluster, super-peer is responsible to builddistributed R*-tree. First, super-peer and passive-peers build theirR*-trees locally. Then, passive-peers send their summarized spa-tial information, i.e., internal nodes in their R*-tree, to super-peer,and the level of the internal nodes to be sent depends on the leftroom in super-peer’s root for accommodating the coming nodes,the less, the higher. If the insertion of nodes from passive-peers

supports window query and kNN query; when a passive-peer issuesa query, it first sends the query to its super-peer. Based on NR-tree, super-peer will redeliver the query to the passive-peers which

n of group B.

Page 6: P2P-based multidimensional indexing methods: A survey

C. Zhang

et al.

/ The

Journal of

Systems

and Softw

are 84 (2011) 2348– 2362

2353Table 4Summary and comparison of P2P-based MI.

P2P-based MI Query type Space partition Topology Data type Lookup method Lookupperformance

Peerjoin/leave

Data inser-tion/deletion

Load balancing Updatestrategy

P2PR-tree Window query Quadtree + Rtree(static + dynamic)

Tree Spatial data Routing along the pathleave- > root- > leave

O(log N) O((log N)2) O(log N) None None

NR-tree Window query,kNN query

KD-tree-likepartition

CAN-like + Tree Spatial data Distributed R*-tree lookupprotocol between super-peer andpeer, and super-peers use CAN tocommunicate with each other

O(log N + dN1/d),where d is thedimensionalityof the space

O(log N + 2d) O(log N + dN1/d) None Super-peerpromotion

P2PRdNN-tree RNN query Overlappingregions

Tree Spatial data Main channel +domain searchusing distributed tree

O(log N) None None None None

DiST Window query KD-treepartition

CAN-like Point data Through neighbors (space-adjacentpeers)

O(dN1/d) O(dN1/d + 2d) O(dN1/d) None Piggybackupdate

GHT* Range query Dynamic,through splitoperation

Incompletetree

Point data Client/server No guarantee,depends onhow servernodes connectto each other

None Noguarantee

None Imageadjustment

SIMPEER Range query,kNN query

Partition byclustering

Super-peer-basedtree-like

Point datain metricspace

Mapping metric space to 1dinterval, super-peer-based lookupmechanism, search locally usingB + -tree

No guarantee,depends on thelinks betweensuper-peer andpeers

None None None None

SiMPSON Range query,kNN query

Partition byclustering

Any structuredP2P

Point datain metricspace

Mapping metric space to 1dinterval, underlying P2P intervalsearch

Depends onunderlying P2Poverlay

None None None None

MAAN Window query Hashing pointto peer

Ring Point data Similar to Chord, but for ad-dimensional query it will useChord-like lookup mechanism ford times

O(d log N) O(d(log N)2) O(d log N) None None

Mercury Window query Hashing pointto peer

Multiple rings Point data Similar to Chord, uniquely,Mercury uses histogram to decidewhich dimension should bequeried first

O(d log N) O(d(log N)2) O(d log N) Histogram-based loadbalancing

None

PRoBe Window query Load-equalsplit

CAN-like Both pointand spatialdata

CAN-like query mechanism,uniquely PRoBe uses cache tospeed up query

O(dN1/d) O(2d) O(dN1/d) Virtual node None

MURK Window query KD-tree-likeload-equal split

CAN-like Both pointand spatialdata

CAN-like query mechanism,sometimes using links betweenremote peers

O(dN1/d) O(2d) O(dN1/d) Both loadmovement andpeermovement

None

BATON* Window query Hashing pointsto peers, spacefilling curve

Tree Point data Tree search mechanism usingneighbors, parent and children

O(logm N) O(m logm N) O(logm N) Both loadmovement andpeermovement

None

VBI-tree Window query Any partition Binary tree Both pointand spatialdata

Tree search mechanism usingneighbors, parent, ancestor, andchildren

O(log N) O(log N) O(log N) Both loadmovement andpeermovement

None

RT-CAN Window query,kNN query

CAN-likepartition

CAN-like Both pointand spatialdata

CAN-like query mechanism O(log N) O(log N) O(log N) None According tothe ratio ofqueryfrequency toupdatefrequency,adjust the levelof the nodes tobe published

Page 7: P2P-based multidimensional indexing methods: A survey

2354C.

Zhang et

al. /

The Journal

of System

s and

Software

84 (2011) 2348– 2362

Table 4 (Continued)

P2P-based MI Query type Space partition Topology Data type Lookup method Lookupperformance

Peerjoin/leave

Data inser-tion/deletion

Load balancing Updatestrategy

DHR-tree Range query Overlappingregions

P-tree Both pointand spatialdata

Using Hilbert curve and P-treequery mechanism

O(logN) O(logN) O(logN) none none

DistributedQuadtree

Window query Quadtreepartition

Chord Spatial data Using hash function to hash spatialquery to Chord, then using Chordquery mechanism to retrieveobjects

O(log N) O((log N)2) O(log N) None None

DKDT Range query,kNN query

KD-tree-likepartition

Tree + DHT Point data Tree searchmechanism + underlying DHTsearch mechanism

O(log N) None None None None

m-LIGHT Window query KD-tree-lkepartition

Tree Point data Tree search mechanism+m-naming function

O(log N) O(log N) O(log N) None Incrementalindexmaintenanceand data-awaresplitting

SkipIndex Window query,kNN query

KD-tree-likepartition

SkipGraph Point data KD-tree traversal + SkipGraphquery mechanism

O(log N) O(log N) O(log N) None None

Squid Window query Hilbert curve Chord Point data Using Hilbert curve to mappingspatial query to 1d space, and thenusing Chord query mechanism

O(log N) O((log N)2) O(log N) Load balancingat peer join andruntime

None

SCRAP Window query Hilbert curve SkipGraph Point data Using Hilbert curve to mappingspatial query to 1d space, and thenusing SkipGraph query mechanism

O(log N) O(log N) O(log N) Both loadmovement andpeermovement

None

Z-NET Window query Z-curve SkipGraph Point data Similar to SCRAP O(log N) O(log N) O(log N) Load balancingat peer join andruntime

None

M-Chord Range query,kNN query

Voronoidiagrampattition

Chord Point data iDistance +Chord querymechanism

O(log N) O((log N)2) O(log N) None None

MCAN Range query CAN-like CAN Point data Using CAN query mechanism O(dN1/d) 2d None None NoneLINP Range query VA-file Unstructured

P2PPoint data Using underlying P2P overlay and

approximation in VA-file to searchNo guarantee,depends on TTL

Constant None None None

P2P Delaunay Range query Delaunaytriangulation

Delaunaytriangle

Point data Half-moon flooding and greedyrouting

No guarantee,depends on theradius ofhalf-moon anddiameter ofnetwork

Noguarantee

None None None

SWAM Range query,kNN query

Voronoidiagrampartition

Voronoidiagram+ smallworld

Point data Greedy routing using Voronoidiagram

O(log N) None None None None

EZSearch Range query,kNN query

Partition byclustering

Zig-Zaghierarchy

Point data Tree-like traversal top/down O(log N) None None Peers take theirturns tobecome thecluster-head

None

SPATIALP2P Window query Partition bygrid

CAN-like+ longlinks

Point data Using neighbors and long links O(log N) None None None None

Page 8: P2P-based multidimensional indexing methods: A survey

C. Zhang et al. / The Journal of Systems an

p(

tMLsciahafsRqm

6

s(pioss

6

tmGdtbedftp

Fig. 6. Visualization of group C.

ossibly contain the answers according to intersection or mindistRoussopoulos et al., 1995).

P2PRdNN-Tree (Chen et al., 2006): P2PRdNN-Tree is underhe context of reverse nearest neighbor (RNN) query (Korn and

uthukrishnan, 2000; Yang and Lin, 2001; Stanoi et al., 2000).ike NR-tree, P2PRdNN-Tree is a super-peer-based P2P topology. Amall subset of peers with relatively high stability and computingapacity are selected as super-peers. Unlike NR-tree, super-peersn P2PRdNN-tree use main channel to deliver messages, which isctually a broadcast manner. This will cause communication over-ead increasing when queries are routed. Peers send their MBRnd L dnn (a list of distances, each of which contains the distancerom every data point to its nearest neighbor) information to theiruper-peers. Super-peer reorganizes these information and builddNN (Yang and Lin, 2001) tree which is an index tree for RNNuery based on R-tree. The routing is similar to NR-tree except thatessage delivering among super-peers are in a way of broadcast.

.2. Extending KD-tree

DiST (Nam and Sussman, 2006): DiST is a fully decentralizedtructure that supports multidimensional query. DiST uses KD treeBentley, 1975) partition method to divide the space into cells. Eacheer is responsible for one cell and builds local index. The global

ndex is distributed across all peers. Each peer has only partial viewf the global index, and this partial view is used for navigating inearching. Furthermore, peer joining, peer leaving and searching isimilar to CAN topology.

.3. Extending GHT

GHT (Generalized Hyperplane Tree) (Uhlmann, 1991) is a cen-ralized binary tree supporting similarity search in the context of

etric space. Objects are stored in buckets which are leaf nodes inHT. The internal nodes contain two pointers pointing to descen-ant nodes (sub-trees) represented by a pair of objects calledhe pivots. Pivots represent routing information, which is used inucket location algorithms. If the number of objects in a bucket B0xceeds a certain threshold, B0 should be split. The splitting proce-

ure is implemented by choosing a pair of pivots P1 and P2 (P1 /= P2)rom B0 and moving all the objects O, which are closer to P2 thano P1, into a new created bucket B1. The pivots P1 and P2 are thenlaced into a new internal node and the tree grows by one more

d Software 84 (2011) 2348– 2362 2355

level. In order to perform a similarity Range Search (q, r) for thequery object q and the search radius r, the query should be recur-sively traversed the GHT along the left child of each internal nodeif the formula d(P1, q) − r ≤ d(P2, q) + r is satisfied, or the right childif the formula d(P1, q) + r > d(P2,q) − r is true. If both of the two for-mulas are met, the left child and the right child are both searched.The procedure of bucket in GHT is illustrated in Fig. 7.

GHT* (Batko et al., 2005): GHT* is a distributed version of GHT.However, GHT* is not a P2P topology in nature. GHT* consists ofserver nodes and client nodes. Server nodes can insert, store, andretrieve objects and they maintain data objects in a set of buckets;client nodes can only insert objects and issue queries. An AddressSearch Tree (AST) which is a structure similar to GHT is maintainedin every node. Every leaf of the AST includes exactly one pointerto either a bucket or a server node holding the data. When a nodewants to deliver a query or an insertion message, it can use AST tonavigate to the correct bucket or server node. Lazy update strategyis used for updating AST, which makes the replication performanceapproximately to O(log N).

P2PAKNNS (Yu and Yu, 2007): P2PAKNNS is an adaptation ofGHT* under context of high dimensional KNN search. The maincontribution of P2PAKNNS is that it proposes a new similarity func-tion HDSim to measure distance between objects. Furthermore,this measure function can adaptively determines the size of thehyper-sphere and avoid the problems that Lk-norm leads to thenon-contrasting behavior of distance in high dimensional space.Experiments have shown that HDSim is more appropriate for highdimensional space than Euclidean distance.

6.4. Extending iDistance

iDistance (Jagadish et al., 2005) is a centralized indexing methodfor similarity search. It partitions the data space into n clusters andselects a reference point Ki (cluster center) for each cluster Ci. Eachdata object is assigned a one-dimensional iDistance value accordingto the distance to its cluster’s reference object. iDistance uses a con-stant parameter c to separate individual clusters, and the iDistancevalue for an object x ∈ Ci is iDist(x) = i ∗ c + dist(Ki, x), where dist(a,b) means the distance from a to b. The chosen parameter c mustbe large enough to map all objects in cluster i to the interval [i * c,(i + 1) * c]. Based on this key idea, a B+-tree can be used to index one-dimensional iDistance value. Thus, similarity search is transformedto an interval search. For a range query R(q, r), for each cluster Ci thatsatisfies the inequality dist(Ki, q) − r ≤ ri, the interval [i ∗ c + dist(Ki,q) − r, i ∗ c + dist(Ki, q) + r] is retrieved using B+-tree, where ri meansthe radius of cluster Ci. Besides, the candidate objects pi are fur-ther evaluated using inequality dist(pi, q) ≤ r to produce the results.Nearest neighbor search is based on repetitive range queries withgrowing radius. The mapping mechanism of iDistance is showed inFig. 8.

SIMPEER (Doulkeridis et al., 2007b): SIMPEER is a super-peer-based P2P framework adopting iDistance idea for similarity searchin metric space. SIMPEER is a three-level clustering scheme, whichconsists of local clustering, clustering among a super-peer and itspeers, and clustering among super-peers. The resulting clusters areused to index local points using iDistance. Peers send their clusterdescriptions to their super-peers, and each super-peer computesthese cluster descriptions to form hyper clusters. Based on clusterdescriptions from its peers and hyper clusters it generates, eachsuper-peer maps clusters it knows to one-dimensional value, justas iDistance maps each object. Each cluster Cj is mapped into aone-dimensional value based on the nearest hyper-cluster center

Oi using formula: keyj = i * c + [dist(Oi, Kj) + rj]. This is called exten-sion of iDistance which is used by a super-peer to decide whichof its peers should process a query and the search procedure ofextension of iDistance acts similar to iDistance. Hyper-clusters are
Page 9: P2P-based multidimensional indexing methods: A survey

2356 C. Zhang et al. / The Journal of Systems and Software 84 (2011) 2348– 2362

Fig. 7. Splitting of b

cbln

aSaqiesac

7

7

NItdapmmTqssaai

Fig. 8. Mapping from metric space to 1D space in iDistance.

ommunicated among super-peers and are further summarized touild a set of routing clusters. These are maintained at super-peer

evel and they are used for routing a query across the super-peeretwork.

SiMPSON (Vu et al., 2009): Be similar to SIMPEER, SiMPSON is P2P index for similarity search in metric space. However, SiMP-ON does not adopt a super-peer-based P2P as SIMPEER, while,ny structured P2P system supporting single-dimensional rangeueries can be used for SiMPSON’s underlying topology. Every peer

nitially clusters its data locally, and then, be similar to iDistance,very cluster is mapped into 2 one-dimensional values namedtarting index and ending index. With these 2 values, SiMPSON isble to shorten the search interval comparing with SIMPEER, whichan reduce cost during similarity search.

. Extending P2P

.1. Extending chord

MAAN (Cai et al., 2003): MAAN (Multi-Attribute Addressableetwork) is focused on multi-attribute range query in P2P systems.

t uses a locality-preserved hashing to map attribute values ontohe Chord. This approach either uses a direct mapping of the dataomain to the Chord space or assumes that the input data rangend distribution are known in advance to create a balanced map-ing. Supposing that each object Oi has n attributes, MAAN thenaps these n attribute values to the Chord. It means there are atost n peers keeping information of Oi with all n attribute values.

his procedure is called registration of Oi. MAAN supports iterativeuery and single attribute dominated query. The former one is verytraightforward. For m attributes range query, MAAN produces m

ub-queries and intersects the results at query originator to gener-te the final results. The latter one uses the fact that each peer keepsll attributes values of an object, so the peer can decide which objectt holds is satisfied. That is, MAAN just picks one of m attributes as

ucket in GHT.

a query to search in Chord; every peer that possibly contains theresults will use the other (m − 1) attributes sub-queries to filterresults and then sends final results back to the query originator.

Mercury (Bharambe et al., 2004): Mercury is a multi-attributerange query system similar to MAAN. It uses multiple circular over-lays and organizes the peers of the system into these overlays. Eachcircular is addressing for one attribute ordering by attribute valuecalled attribute hub. Two hubs are connected through a link built bythe peers located in these hubs. For a multi-attribute range query,any hub h can be the receiver. The peers in h first search as a Chordmanner for the attribute they address, and then through the linksbetween hubs, the query will be sent to other hubs to explore otherattributes.

RangeGuard (Ntarmos et al., 2006): RangeGuard is a super-peer-based P2P system which addresses the problem of range query. Ituses a two-level ring, in which super-peers are located at top leveland normal peers are at bottom. RangeGuard assumes that super-peers are powerful so that they are able to maintain a data rangeincluding the value that its peers possess. If a super-peer leavesthe network, a promotion will be executed to produce new super-peer. RangeGuard maintains two ring structures simultaneously, soit would pay more cost.

HRing (Zhuge et al., 2008): HRing is a P2P system adopting har-monic series to build routing table. Be similar to Chord, HRingorganizes peers into a ring. Each peer maintains not only predeces-sor and successor but also routing table built with harmonic series.Using harmonic series to build routing table, for a query the perfor-mance of system can achieve O(log N). HRing adopts the same ideaas MAAN for multi-attribute range query.

7.2. Extending CAN

PRoBe (Sahin et al., 2005): PRoBe organizes peers into amulti-dimensional logical space, which is similar to CAN. Thedimensionality of this space is set to the number of range attributes.Each dimension corresponds to one attribute and is bounded bythe domain of the corresponding attribute value. The logical spaceis divided into non-overlapping rectangular regions called zones.Each peer in the system is assigned a zone and is responsible formaintaining the data objects mapped to its zone. When a data itemis inserted into the system, it is mapped to the point correspond-ing to its values for the attributes. Multidimensional range query isprocessed in 2 phases. In the first phase, the query is transformedto a hyper-rectangle in the logical space, which will be referred toas the query box. Then the query is routed from the initiator to apeer whose zone intersects the query box. In the second phase, thepeer receiving the query forwards the query to its neighbors whosezones intersect with the query box. Thus, every peer in the query

box will receive the query and all qualifying data objects will bereturned.

MURK (Gamesam et al., 2004): MURK uses KD-tree divisionto partition the space, i.e., the dimensions are used cyclically in

Page 10: P2P-based multidimensional indexing methods: A survey

ems an

stmdBuHnfwiitt2

awTwOO

ssdcpTcmcTfl

7

bafcavttimmvarisbocls

BlpNbw

C. Zhang et al. / The Journal of Syst

plitting. Such partitioning method is similar to CAN. However,here is a few differences between them. CAN hashes data into a

ultidimensional space, and since data is expected to be uniformlyistributed, a new peer splits an existing peer’s data space equally.ut in MURK, it is the load of a peer to be split equally. MURK alsoses greedy routing for a multidimensional query similar to CAN.owever, MURK points out that there are two shortcomings. One ison-uniformity: the number of grid neighbors of a peer is not uni-

orm, because the data space is not equally splitted. Non-uniformityould cause unbalanced routing load. The second shortcoming is

nefficiency: every peer just connects to its grid neighbors and thiss not efficient when routing. To solve this problem, MURK proposeswo methods, one is randomly building skip links between peers;he other is using space fill curve and SkipGraph (Aspnes and Shah,003) to build skip links.

SCAN (Sun, 2007): SCAN is a structured P2P overlay SCAN thatugments CAN overlay with long links based on Kleinberg’s small-orld (Kleinberg, 2000) model in a d-dimensional Cartesian space.

he construction of long links does not require the estimate of net-ork size. Queries in multi-dimensional data space can achieve(log N) hops by equipping each node with O(log N) long links and(d) short links.

HyperCBR (Castelli et al., 2008): HyperCBR is an approach toolve the problem of content-based routing in a multidimensionalpace. The global space is equally partitioned into grids. Peers areivided into publishers and subscribers and they are assigned to therossing point of the grid. The subscriber sends subscriptions andropagates them along the entire column the subscriber belongs to.he publisher generates events and disseminates them along a rowrosses all the columns. When an event hits a column containingatching subscriptions, it is “captured” by that column and dupli-

ated along it by following the path established by subscriptions.he routing mechanism would pay high cost because it behaves asooding.

.3. Extending BATON

BATON (Jagadosj et al., 2005) is an overlay structure based on theinary balanced tree in which each peer in the network maintains

node of the tree. A node may connect to other nodes by up toour different kinds of links: parent links pointing to parent nodes,hildren links pointing to child nodes, adjacent links pointing todjacent (in linear order) nodes that maintain adjacent ranges ofalues, and neighbor links pointing to selected neighbor nodes athe same level and have a distance equal to a power of two fromhe node. In BATON, each node in the tree, both leaf and internal,s assigned a range of values in which the range of values directly

anaged by a node is required to be greater than the range of valuesanaged by its left adjacent node while smaller than the range of

alues managed by its right adjacent node. When a node receives query request, if the searched value does not fall into its ownange of values, the request is always forwarded to (1) a node ints left routing table whose upper bound is still greater than theearched value or (2) a node in its right routing table whose loweround is still lower than the searched value if such a node exists,r otherwise, (3) the query request is forwarded to either its lefthild/right child or its left adjacent/right adjacent node. For nodeeaving or load balancing, BATON should be restructured. Fig. 9hows a BATON structure.

BATON* (Jagadish et al., 2006a): BATON* is an extension ofATON focusing on reducing the cost of searching from log2 N to

ogmN (2 < m) by enlarging the fanout of the tree. In BATON*, each

eer node can have up to m children instead of 2 as in BATON.eighbor routing tables at a node maintain links to selected neigh-or nodes at the same level which has a distance equal to d ∗ mi,here d = 1 . . . m − 1 and i ≥ 0, from the node itself. Due to a larger

d Software 84 (2011) 2348– 2362 2357

fanout, the number of deliver hops is reduced. However, the costof maintaining the tree structure is increasing. BATON* dynam-ically adjusts the fanout to make tradeoff between query andmaintenance according to the ratio of query frequency to updatefrequency. BATON* supports multi-attribute range query. It dividesall m attributes into two groups according to their query frequency(supposing that in these m attributes, there are n attributes fre-quently queried). Then, BATON* divides the whole tree structureinto n + 1 sections. For the first n sections, each section indexeseach frequently queried attribute. For the last section, the left m − nattributes are mapped into one-dimensional space with HilbertSpace Filling Curve, and such one-dimensional space is indexed bythe last section of the tree.

VBI-tree (Jagadish et al., 2006b): VBI-tree is a variation of BATONto support multidimensional query. The peers can be partitionedinto two classes: data peers (leaf nodes in the tree) and routingpeers (internal nodes in the tree). Data peers actually store datawhile routing peers only keep routing information. Like BATON,each routing peer in the VBI-tree maintains links to its parent, itschildren, its adjacent nodes and its sideways routing tables. More-over, each routing peer maintains an upside table with informationabout regions covered by each of its ancestors. When a query isissued to a peer, the peer first decides which peer to forward thequery according to the relationship between its region and thequery, and the relationship between regions of other peer and thequery using routing tables or upside tables. Then the forwardingprocedure is similar to that in BATON and queries can be answeredwithin O(log N) hops.

SDI (Zhang et al., 2009): SDI is an improvement of VBI-tree.It reduces the links to gain a low maintenance cost. SDI believesthat the upside path links in VBI-tree cost too much and should bereplaced with ancestor links which are less r than upside path ones.The query performance of SDI is enhanced proven by experiments.

8. Combining centralized MI and P2P

8.1. Combining R-tree and P2P

RT-CAN (Wang et al., 2010): The research context of RT-CAN iscloud computing which is a new area for P2P-based MI. RT-CANcombines CAN and R-tree to study the problem of indexing mul-tidimensional data in cloud systems. Each peer in the system isassigned two roles: storage node and overlay node. The storagenode maintains a portion of application data and builds an R-treefor its local data. The overlay node is a node in the structured over-lay – CAN, and it is responsible for a partition of CAN. The storagenode selects the nodes in the level above the leaf level of R-tree topublish to CAN by interacting with overlay node. RT-CAN adopts apublishing strategy, that is if radius of the R-tree’s node N to be pub-lished is lower than a threshold, then N is sent to the cluster nodewhose zone contains N’s center; otherwise all CAN nodes overlap-ping with N need to maintain N in its global index. For a windowquery, first, the query is routed to the CAN node C whose zone con-tains the center of query window; then the query is recursivelyforwarded to all neighbors that overlap with the query window;on receiving the query, the storage node searches its local indexand returns the results.

DHR-tree (Wei and Sezaki, 2006): DHR-tree combines HilbertR-tree (Kamel and Faloutsos, 1994) and P-tree (Crainiceanu et al.,2004) which is P2P overlay of distributed version of B+-tree. Eachpeer p is assumed to control some multidimensional data objects

comprised by a MBR in multidimensional space. So p is associatedwith a key that is the smallest Hilbert code among the verticesof the MBR controlled by p. Then as an overlay network, P-tree isemployed to organize peers according to their keys. For a window
Page 11: P2P-based multidimensional indexing methods: A survey

2358 C. Zhang et al. / The Journal of Systems and Software 84 (2011) 2348– 2362

TON s

qfq

8

utpmioNi

qCmbtn

8

KFnaomcatmimtbrnt

Fig. 9. BA

uery, the query originator executes like an R-tree style search, i.e.,rom the root of the P-tree to every nodes that intersect with theuery window.

.2. Combining Quadtree and P2P

Distributed Quadtree (Tanin et al., 2007): Distributed Quadtreeses MX-CIF quadtree to partition the space. The function of parti-ion is to make every spatial object marked with several controloints. A consistent hash function (e.g., SHA-1) can be used toap these control points into one-dimensional values that can be

ndexed by Chord. Thus, with these control points, a spatial objectr a spatial query can be indexed or searched by peers in Chord.otice that, a spatial object can be indexed by more than one peer

n Chord, and this would increase the maintaining cost.CAN-QTree (Zuo et al., 2008): CAN-QTree combines CAN and

uadtree, which is similar to Distributed Quadtree that combineshord and quadtree (Finkel and Bentley, 1974). CAN-QTree stillaps the control points of spatial objects to the values that can

e indexed in CAN just like Distributed Quadtree does. With CAN,he CAN-QTree can provide O(N1/2) search performance which isot as good as Distributed Quadtree.

.3. Combining KD-tree and P2P

DKDT (Gao and Steenkiste, 2007): DKDT combines Chord andD-tree to solve the problem of similarity search in P2P systems.irst, it uses KD-tree to recursively partition the space until theumber of data points within a cell becomes less than or equal to

given bucket size, and then uses a hash function to map each cellnto the underlying overlay network. The peers maintain the cellapped onto the ring in a Chord manner, i.e., each peer maintains

ells between its predecessor and itself. Moreover, in DKDT, peersre maintained not only in the DHT way but also in a KD-tree way,his is implemented by distributed tree maintenance. Every peer

aintains a local database called Tree Information Base (TIB) whichs a partial view of the whole tree. Peers constantly send probing

essage to establish TIB and maintain it. We believe this will costoo much, because two structures (i.e., Chord and KD-tree) should

e maintained constantly and they do not match each other. For aange query, the query originator uses TIB to determine the lowestode (peer) in the tree that covers the query point, and then sendshe query to the peer. The peer receiving the query not only runs

tructure.

local retrieval but also forwards the query according to TIB it holds.The query originator also maintains a priority queue to help refinethe results and answer the KNN query.

m-LIGHT (Tang et al., 2009): m-LIGHT is very similar to DKDT.It uses KD-tree to partition the space, and each leaf node in theKD-tree is assigned a key using m-dimensional naming functionwhich can explore the relationship between the leaf node and itslowest ancestor. With a key, every leaf node can be managed bya peer in the underlying DHT. Operation of insertion, query anddeletion is similar to DKDT only with difference that m-LIGHT usesa data-aware splitting strategy to achieve load balancing.

SkipIndex (Zhang et al., 2004): SkipIndex combines SkipGraphand KD-tree to solve the high-dimensional indexing problem. Itutilizes KD-tree to partition the space, and then according to thesplitting history, each leaf node of KD-tree will get a string com-posed of 0 s and 1 s, which is the key of that node. With the keya leaf node can be associated with a peer in SkipGraph. Thus theglobal space is distributed across the peers. So peers may follow theoperations in SkipGraph to insert, delete and search data objects.

8.4. Combining Space Filling Curves and P2P

Squid (Schmidt and Parashar, 2003): Squid uses Hilbert SpaceFilling Curve to map the multidimensional space into one dimen-sional space which can be indexed by Chord. The operations ofpeer joining and peer departure are the same with Chord. Fora window query, the query box is transformed into a series ofone-dimensional interval queries which are delivered into Chord.If a query box is very large, there might be quite a lot of one-dimensional interval queries, and this would cause flooding-likesearch. To avoid such situation, Squid uses a query optimizationwhich recursively refines the query with a prefix tree. This querytree is embedded onto the ring topology of the overlay network,thus many peers that do not contain valid data elements are prunedaway.

CISS (Lee et al., 2004): CISS is P2P framework for efficient multi-dimensional search in the context of massively multiplayer onlinegames (MMOGs) and P2P catalog systems. CISS believes that exist-ing DHT-based P2P systems cannot support these applications

efficiently due to object declustering which can result in significantinefficiencies in data updating and multidimensional range queryrouting. So CISS suggests that a locality-preserved function wouldmake data objects more clustered instead of hash function. Be
Page 12: P2P-based multidimensional indexing methods: A survey

ems an

sCstwq

adoSgbS

tSa

8

dwtdsmbfeoippvslt(narr

bPprttwI

fistlmiorc

C. Zhang et al. / The Journal of Syst

imilar to Squid, CISS combines Hilbert curves and underlyinghord topology. Differences are that, in CISS, peers utilize cache topeed up update operation. Moreover, for a query box, CISS believeshat the recursive refinement of query with a prefix tree in Squidould cause single point failure, so CISS adopts a forwarding-based

uery routing protocol to optimize the query.SCRAP (Gamesam et al., 2004): SCRAP combines SkipGraph

nd Space Filling Curve. The common of SCRAP and SkipIn-ex is that they both encode the multidimensional objects withne-dimensional keys and associate the keys with the peers inkipGraph so that the distributed maintaining and querying thelobal MI can be implemented through SkipGraph. The differencesetween them are that, SkipIndex uses KD-tree to get the keys whileCRAP uses Space Filling Curve to achieve this.

Z-NET (Shu et al., 2005): Z-NET and SCRAP are almost the same;he only differences is that, in SCRAP the authors propose to usepace Filling Curve (Z-curve or Hilbert curve), which in Z-NET theuthors explicitly propose to use Z-curve.

.5. Combining iDistance and P2P

M-Chord (Novak and Zezula, 2006): M-Chord is a distributedata structure for similarity searching in general metric spaces,hich combines iDistance and Chord. M-Chord utilizes iDistance

o map objects in metric space to a [0,2m) domain of keys, then theomain is divided into intervals and each peer in Chord is respon-ible for any insertion, deletion and retrieval in an interval throughaintaining a B+-tree. And then the Chord search mechanism can

e followed for similarity search in metric space through the trans-orm by iDistance. For a range query (q, r), the global knowledge ofach cluster Ci’s the center Oi and the radius ri will help the queryriginator to transform the query into a series of intervals usingDistance, then for each interval, the query originator send it to theeer pi responsible for the midpoint of that interval. Furthermore,i forwards the interval if it is not responsible for the whole inter-al. Every peer that receives the query processes local retrieval andends back results. For a KNN query (q, K), the query originator usesocal B+-tree to find K objects that are near q. The distance u from qo the Kth nearest object is taken as parameter of the range queryq, u) and the query originator runs the range query (q, u) in theetwork. Then the originator picks K objects from the results as thenswers to the KNN query, and if the object number in returnedesults is less than K, the originator enlarges the search radius anduns range query again until it finds K objects.

R-Chord (Yin et al., 2009): Similar to M-Chord, R-Chord com-ines Chord and iDistance. However it uses a concept called Relativeosition Code (RPC) to filter the objects when similarity search isrocessed. A RPC is a bit string of 0 s and 1 s which represents theelative position between a data point and the reference point ofhe cluster in which the data point locates. RPC is able to reducehe distance calculations and can be used for discarding the object,hose lower bound distance is greater than the pruning distance.

t is proved that the searching cost time is reduced with RPC.MCAN (Falchi et al., 2005): MCAN combines CAN and iDistance

or similarity search in metric space. The important idea of MCANs that it associates each object in metric space with a coordinate,o the problem of searching in metric space is transformed intohat in a CAN space. The coordinate for every object is computedike this: let P1, . . ., PN be the number of pivots selected from the

etric dataset, for each object O, F(O) = (d(O, P1), d(O, P2), ..., d(O, PN))

s computed and assigned to that object as its coordinate. Sotheperation of insertion and splitting is very similar to CAN. For aange query (q, r), the query is transformed into a hypercube whoseenter is F(q) and side length is 2r. Then the hypercube is delivered

d Software 84 (2011) 2348– 2362 2359

into the CAN network and the search mechanism of CAN can befollowed to retrieve objects.

8.6. Combining other centralized indexes and P2P

DP-tree (Li and Lee, 2006): Actually, DP-tree is not a structure forindexing data in multidimensional space. What we want to men-tion here is that DP-tree adopts a new kind of load-balancing calledwavelet-assisted load balancing. This will be described in Section9.

LINP (Cui et al., 2007): LINP is focused on similarity search inunstructured P2P systems, and it utilizes VA-file to partition thewhole space into equal cells and assign each cell a key which is abit string composed of 0 s and 1 s. So every multidimensional objectcan be represented by a key according to the cell it locates. Eachpeer is responsible for one cell or more, and it maintains a routingtable of the neighbors whose regions are near to its own through aninverted index. Each entry in the inverted index is a binary tuple 〈id,list〉, where id denotes the key of a cell and list is a linked list thatcontains the peers who have the routing information for the cellidentified by the id. When a peer p issues a range query, it first scansits inverted index to find which peer’s region intersects with thequery according to VA-file. With a constraint TTL, p sends the queryto the peers who possibly have the answers and searches locally,if its region intersects with the query. Other peers who receive thequery run the same procedure as p does and return the answers ifthey have.

9. Miscellaneous

9.1. Using delaunay triangulation

P2P Delaunay (Kang et al., 2004): P2P Delaunay builds P2Poverlay topology based on delaunay triangulation (Preparata andShamos, 1985) and studies spatial query under this topology. Eachpeer corresponds to a vertex in delaunay triangle and the overlaycommunication link between two peers corresponds to the edgelinking the corresponding vertices in delaunay triangle. Processinga range query can be divided into two phases: routing and refine-ment. In the routing phase, the authors propose two routing mode.One is half-moon routing which forwards query to the peers locatedwithin the half-moon, i.e., the flooding is limited by the half-moon.The other routing mode is greedy routing which forwards queryto the neighbor peer (the next hop) that is nearest one to desti-nation among the neighbors. The refinement phase begins at thequery reaches a peer p in the query region. The peer p proceedsto find other peers falling into the query region using its list ofneighbors. Then the results are sent back to the query originator.The half-moon routing would cause too many flooding messageswhich reduce system performance.

SWAM (Banaei-Kashani and Shahabir, 2004): SWAM is proposedfor similarity search in P2P system. It divides the space into cellsusing a Voronoi diagram (Okabe et al., 2000) and assigns each peerto respond for one cell. Each peer is not only connected by its neigh-bors but also linked with some remote peers by the small worldmodel (a random graph). For a range query (q, r), the query is for-warded to the peer whose cell covers the query point q with greedyrouting, which leverages the property of Voronoi diagram. Whenthe query reaches the peer p responsible for q, p sends the query toits neighbors and proceeds local search to answer the query.

9.2. Using distributed clustering

DESENT (Doulkeridis et al., 2007a): DESENT is focused on decen-tralized and distributed generation of semantic overlay in P2Pnetworks. DESENT extracts feature vectors from the documents

Page 13: P2P-based multidimensional indexing methods: A survey

2 ems an

ada5fta

btatmfaAjzihhw

9

nnoTqWcaa

stipiepbdbtdiApscjbi

1

apso

360 C. Zhang et al. / The Journal of Syst

nd thus constructs a vector space. The generation procedure isivided into 5 phases; local clustering, initiator selection, zone cre-tion, intra-zone clustering and inter-zone clustering. After these

phases are finished, a hierarchical tree-like overlay network isormed. The query is routed from the root of the tree downwardso most similar cluster until a peer is located. We believe that such

routing procedure would bring the root heavy query load.EZSearch (Tran and Nguyen, 2008): The authors of EZSearch

elieve that P2P-based MI should consist of two parts: communica-ion architecture and indexing architecture. For the communicationrchitecture, the peers in EZSearch are clustered, and for each clus-er, the cluster head participates in the higher clustering in a Zigzag

ode. The procedure repeats for many times and finally the peersorm a hierarchical communication architecture. For the indexingrchitecture, EZSearch assigns one indexing zone to each cluster.ll the zones at the bottom of the hierarchy form a complete dis-

oint partitioning of the entire index space. At higher layers, theone of a cluster is the union the zones of all the clusters that callt their supercluster. Obviously, peers that appear higher on theierarchy have to process more workload than those lower on theierarchy. So, EZSearch must adopt an explicit load balancing. Thisill be seen in Section 9.

.3. Others

FuzzyPeer (Kalnis et al., 2006): FuzzyPeer is an unstructured P2Petwork which is focused on similarity search. To avoid flooding theetwork with messages, FuzzyPeer utilizes the streaming resultsf a query to answer the similar queries which are issued before.o achieve this, FuzzyPeer proposes an idea of “frozen”, i.e., someueried are frozen in the network and distributed across peers.hen new results of other queries are arrived in a peer, the peer

hecks whether the frozen query matches the query which is beingnswered now, if they are similar, then the results are duplicatednd sent back to originator of the frozen query.

SPATIALP2P (Kantere et al., 2009): SPATIALP2P studies how totore and index spatial data in P2P systems. SPATIALP2P believeshat space filling curves do not preserve locality and directional-ty, so it adopts a method that is peers store and index data andeers nearby. Only the case of two-dimensional space is discussed

n SPATIALP2P, in which the space is equally divided into cells andach cell is assigned a two-dimensional coordinate. SPATIALP2Proposes a distance metric between any two cells. The metric is ainary tuple in which the first item means the greater one of theifference between the two cells’ x-coordinates and the differenceetween the two cells’ y-coordinate, and the second item meanshe smaller one. Given two distances d1 and d2, if the first item in1 is greater than that in d2, then d1 is greater than d2; if the firsttems are equal, the order of d1 and d2 depends on the second item.ccording to the distance metric, a cell is assigned to the nearesteer to maintain. Besides, each peer p is not only linked with 4uccessors respectively in the 4 quadrant centered at p, but alsoonnected by some indexed peers who have long links to p. This isust like the peers in finger table in Chord. A query will be routedy the 4 successors and indexed peers to the destinations like that

n Chord.

0. Performance related problems

In the area of P2P-based MI, 2 problems are always considered

nd paid much attention: load balancing and update strategy. Theseroblems are two important factors related to system performance,o when a new P2P-based MI is presented, these two problems areften discussed at the same time.

d Software 84 (2011) 2348– 2362

10.1. Load balancing problem

From the perspective of property of indexing overlay topology,the load balancing strategies can be divided into explicit ones andimplicit ones. Explicit load balancing strategy means that load bal-ancing strategy must be adopted to compensate the shortcomingof unbalanced load due to flaw of underlying topology. For exam-ple, tree-like indexing topology tends to make some peers (e.g.,the root) receive too much query load, e.g., BATON, BATON*, soexplicit load balancing strategy should be adopted. While implicitload balancing strategy means that the indexing topology or algo-rithm is provided some kinds of capabilities to cope with theunbalanced load. For example, distributed quadtree can map non-uniform spatial data to a relative balanced distribution for peersusing consistent hash function, e.g., SHA-1.

From the perspective of methods, the load balancing strategiescan be divided into load movement and peer movement (Ganesanet al., 2004; Godfrey et al., 2006). Load movement means that heav-ily loaded peer transfer its redundant load to its lightly loadedneighbor, while peer movement means that lightly loaded peerexecutes the leave-and-rejoin operation to become a neighbor ofa heavily loaded peer who sheds some load to its new neighbor.However, the leave-and-rejoin operation is prone to cause struc-ture chaos in topology for which adjustment or reconstruction mustbe adopted to maintain the topology. Obviously, the adjustment orreconstruction would cost much and reduce system performance.So, some one proposes to balance the load using virtual peers (Raoet al., 2003). A physical peer is in charge of some virtual peers. Avirtual peer maintains a partition of indexing space and can be splitwhen its load is too much. When a physical peer loads too much,it can transfer a certain number of virtual peers to a lightly loadedphysical one.

From the perspective of the content of load, the load balancingstrategies can be divided into strategies for data load and strategiesfor query load. Sometimes in the tree-like topology, the higher levelthe peers reside, the more queries they receive.

HRing, MURK, and SCRAP are provided with both load move-ment and peer movement. PRoBe, CAN-QTree and SkipIndex adoptsvirtual peers to balance load.

Mercury is similar to Chord because they both uses ring to orga-nize peers. However, Mercury does not adopt consistent hashing,because it intends to support range query. So Mercury must adoptexplicit load balancing strategy. Each peer samples the loads in apiggyback mode to maintain an approximate global load histogram,so each peer may have knowledge of the ratio of its load to globalaverage load, which can be used to determine whether it is a heavilyloaded one. The heavily loaded peer delivers probe messages to theoverlay network. If a lightly loaded peer receives such a message,it would run peer movement strategy to balance the load.

In BATON, BATON*, VBI-tree and SDI, the content of load to bebalanced is query message. Non-leaf peers only do load balancingwith its adjacent peers. Besides, leaf peers can either load balancewith its adjacent peers or find another leaf peer, which is lightlyloaded node, to share its load.

Z-NET and Squid do load balancing at peer join and at run-time. When a new peer joins the system, it picks a heavily loadedpeer h and becomes a neighbor of h to share a portion of h’s load.For runtime load balancing, they do load balancing by using loadmovement and virtual peers.

DP-tree proposes a new load balancing algorithm namedwavelet-assisted load balancing. DP-tree believes that the dissemi-nation of load information would increase the overhead of network,

so it compresses the load by using wavelet called loadwavelet. Theloadwavelet can provide peers with a multi-resolution perspectivewhich helps peers to choose the most reasonable lightly loadedpeer to shed load.
Page 14: P2P-based multidimensional indexing methods: A survey

ems an

qlqb

1

estdti

iioqte

coiep

Bwair

1

pitscv

seateauPf

R

A

B

B

B

C. Zhang et al. / The Journal of Syst

EZSearch clusters peers to form hierarchical topology in whichuery load balancing is considered. EZSearch believes that the peers

ocating in high level of the hierarchy are prone to receive heavyueries, so it does load balancing that peers take their turns toecome the cluster-head.

0.2. Update strategy

A reasonable update strategy is an important contributor tonhance the system performance, which is able to provide P2Pystem with scalability to frequent update applications. In the cen-ralized mode, the metric of index update cost is I/O times, andata update is mainly considered; while in the P2P environment,he metric of index update cost is communication cost, and not onlys data update considered, but also peer joining, leaving and failure.

RT-CAN adjusts the level of the nodes to be published accord-ng to the ratio of query frequency to update frequency. If node Nn R-tree is frequently queried but less updated, then the childrenf N are published to replace N. On the contrary, if node N is fre-uently updated but less queried, then the parent of N is publishedo replace N. This would reduce the index maintaining cost to somextent.

In NR-tree, super-peers and passive-peers watch each otheronstantly using heartbeat message. When a passive-peer leavesr fails, the super-peer records this action in log and updates thendex; when a super-peer leaves or fails, the passive-peers run anlection to select a new super-peer according to the computationower and bandwidth.

DiST uses a piggyback mode to update the index. In BATON,ATON*, VBI-tree and SDI, the peer joining, leaving and failureould cause tree unbalanced, so in these topologies, restructuring

lgorithm is used to adjust tree. In m-LIGHT, the authors proposencremental index maintenance and data-aware splitting algo-ithm, both of which would both raise the performance of index.

1. Conclusion

Indexing multidimensional data in P2P system is a challengingroblem. P2P-based MI techniques not only bring the central-

zed multidimensional index to P2P paradigm, but also strengthenhe ability to process multidimensional complex queries in P2Pystems. In distributed GIS, distributed spatial databases andontent-based distributed retrieval system, this technique will pro-ide scalable performance and accelerate the query processing.

In this paper, we give an overview of P2P-based multidimen-ional indexing methods. First, we analyze the inheriting andvolving of the P2P-based MI, and induce the research groupsnd their investigating style by visualization which is our innova-ion. Then, we classify this technique into 4 kinds, and introduceach indexing topology and algorithm one by one. At last, welso describe performance related problems of load balancing andpdate strategy. We hope this paper will give a global perspective of2P-based MI for all the researchers in this field and bring benefitsor developing new indexing structures in P2P systems.

eferences

spnes, J., Shah, G., 2003. Skip graphs. In: Proceedings of the Fourteenth AnnualACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 384–393.

anaei-Kashani, F., Shahabir, C., 2004. Swam: a family of access methods forsimilarity-search in peer-to-peer data networks. In: ACM Conference on Infor-mation and Knowledge Management (CIKM).

atko, M., Gennaro, C., Zezula, P., 2005. Similarity grid for searching in metric spaces.In: Proceedings of the P2P, Grid, and Service Orientation, pp. 25–44.

eckmann, N., Kriegel, H., Schneider, R., SEEGER, B., 1990. The R*-tree: an efficientand robust access methods for points and rectangles. In: Proceedings of the ACMInternational Conference on the Management of Data (SIGMOD), pp. 322–331.

d Software 84 (2011) 2348– 2362 2361

Bentley, J.L., 1975. Multidimensional Binary Search Trees Used for AssociativeSearching. Communications of the ACM 18 (9), 509–517.

Bertino, E., Ooi, B.C., 1999. The indispensability of dispensable indexes. IEEE Trans-actions on Knowledge and Data Engineering 11 (1), 17–27.

Bertino, E., Sacks-Davis, R., Ooi, B.C., Tan, K.-L., Zobel, J., et al., 1997. Indexing Tech-niques for Advanced Database Systems. Kluwer.

Bharambe, A.R., Agrawal, M., Seshan, S., 2004. Mercury: Supporting Scalable Multi-Attribute Range Queries, ACM SIGCOMM’04.

Cai, M., Frank, M., Chen, J., Szekely, P., 2003. MAAN: A multi-attribute address-able network for grid information services. In: International Workshop on GridComputing.

Castelli, S., Costa, P., Picco, G.P., 2008. HyperCBR: large-scale content-based routingin a multidimensional space. In: Proceedings of the IEEE the 27th Conference onComputer Communications (INFOCOM), pp. 1714–1722.

Chen, D.H., Zhou, J.J., Le, J.J., 2006. Reverse nearest neighbor search in peer-to-peersystems. In: Proceedings of the 7th International Conference on Flexible QueryAnswering Systems, pp. 87–96.

Crainiceanu, A., Linga, P., Gehrke, J., et al., 2004. Querying peer-to-peer networksusing P-trees. In: Proceedings of the 7th International Workshop on the Weband Databases (WebDB), pp. 25–30.

Cui, B., Qian, W.N., Xu, L.H., Zhou, A.Y., 2007. LINP: supporting similarity searchin unstructured peer-to-peer networks. In: Proceedings of the 9th Asia-PacificWeb Conference/8th International Conference on Web-Age Information Man-agement, pp. 127–135.

Doulkeridis, C., Norvag, K., Vazirgiannis, M., 2007a. DESENT: decentralized and dis-tributed semantic overlay generation in P2P networks. Proceedings of the IEEEJournal on Selected Areas in Communications 25 (1), 25–34.

Doulkeridis, C., Vlachou, A., Kotidis, Y., Vazirgiannis, M., 2007b. Peer-to-peer simi-larity search in metric spaces. In: Proceedings of the International Conferenceon Very Large Data Bases (VLDB), pp. 986–997.

Falchi, F., Gennaro, C., Zezula, P., 2005. A content-addressable network for similaritysearch in metric spaces. In: Proceedings of the DBISP2P, pp. 126–137.

Faloutsos, C., 1986. Multiattribute hashing using gray-codes. In: Proceedings of theACM SIGMOD International Conference on Management of Data, pp. 227–238.

Faloutsos, C., Roseman, S., 1989. Fractals for secondary key retrieval. In: Proceedingsof the 8th ACM SIGACT–SIGMOD–SIGART Symposium on Principles of DatabaseSystems, pp. 247–252.

Finkel, R., Bentley, J.L., 1974. Quad-trees: a data structure for retrieval on compositekeys. ACTA Information 4 (1), 1–9.

Gaede, V., Gunther, O., 1998. Multidimensional access methods. ACM ComputingSurveys 30 (2), 170–231.

Ganesan, P., Yang, B., Garcia-Molina, H., 2004. One Torus to Rule Them All: Multi-dimensional Queries in P2P Systems, WebDB.

Ganesan, P., Bawa, M., Garcia-Molina, H., 2004. Online balancing of range-partitioneddata with applications to peer-to-peer systems. In: Proceedings of the 30thInternational Conference on Very Large Data Bases (VLDB), pp. 444–455.

Gao, J., Steenkiste, P., 2007. Efficient support for similarity searches in DHT-basedPeer-to-Peer systems. In: Proceedings of the IEEE International Conference onCommunications (ICC), pp. 1867–1874.

Godfrey, B., Lakshminarayanan, K., Surana, S., et al., 2006. Load balancing in dynamicstructured p2p systems. Performance Evaluation 63 (3), 217–240.

Guttman, A., 1984. R-tree: a dynamic index structure for spatial searching. In: Pro-ceedings of the ACM SIGMOD International Conference on Management of Data,pp. 47–54.

Jagadish, H.V., 1990. Linear clustering of objects with multiple attributes. In: Pro-ceedings of the ACM SIGMOD International Conference on Management of Data,pp. 332–342.

Jagadish, H.V., Ooi, B.C., Tan, K-L., Yu, C., Zhang, R., 2005. iDistance: an adaptive B+-tree based indexing method for nearest neighbor search. ACM Transactions onDatabase Systems 30 (2), 364–397.

Jagadish, H.V., Ooi, B.C., Tan, K-L., Vu, Q.H., Zhang, R., 2006a. Speeding up Search inPeer-to-Peer Networks with A Multi-way Tree Structure. SIGMOD.

Jagadish, H.V., Ooi, B.C., Vu, Q.H., Zhang, R., Zhou, A., 2006b. VBI-Tree: a peer-to-peerframework for supporting multi-dimensional indexing schemes. In: Proceedingsof the ICDE, pp. 34–43.

Jagadish, H.V., Ooi, B.C., Vu, Q.H., 2005. BATON: a balanced tree structurefor peer-to-peer networks. In: Proceedings of the 31st VLDB Conference,pp. 661–672.

Kalnis, P., Ng, W.S., Ooi, B.C., Tan, K-L., 2006. Answering similarity queries in peer-to-peer networks. Information Systems 31 (1), 57–72.

Kamel, I., Faloutsos, C., 1994. Hilbert R-tree: an improved R-tree using fractals. In:in Proceedings of the 20th International Conference on Very Large Data Bases(VLDB), pp. 500–509.

Kang, H.-Y., Lim, B-J., Li, K-J., 2004. P2P spatial query processing by delaunay trian-gulation. In: Proceedings of the W2GIS, pp. 136–150.

Kantere, V., Skiadopoulos, S., Sellis, T., 2009. Storing and indexing spatial data in P2Psystems. IEEE Transactions on Knowledge and Data Engineering (TKDE) 21 (2),287–300.

Kleinberg, J., 2000. The small-world phenomenon: an algorithmic perspective. In:Proceeding of 32nd ACM STOC, pp. 163–170.

Korn, F., Muthukrishnan, S., 2000. Influence Sets Based on Reverse Nearest Neighbor

Queries. ACM SIGMOD.

Lee, J., Lee, H., Kang, S., Kim, S.M., Song, J., 2004. CISS: an efficient objectclustering framework for DHT-based peer-to-peer applications. In: Inter-national Workshop on Databases, Information Systems and Peer-to-PeerComputing.

Page 15: P2P-based multidimensional indexing methods: A survey

2 ems an

L

L

L

M

M

N

N

N

O

O

PP

R

R

RR

S

S

S

S

S

S

S

S

T

T

T

Jiuyang Tang received the B.S. degree from National University of Defense Technol-

362 C. Zhang et al. / The Journal of Syst

i, M., Lee, W.-C., Sivasubramaniam, A., 2006. DPTree: a balanced tree based indexingframework for peer-to-peer systems. In: Proceedings of the 14th IEEE Interna-tional Conference on Network Protocols (ICNP), pp. 12–21.

iu, B., Lee, W.-C., Lee, D.L., 2005. Supporting complex multi-dimensional queriesin P2P systems. In: Proceedings of the 25th IEEE International Conference onDistributed Computing Systems, pp. 155–164.

ua, E.K., Crowcroft, J., Pias, M., et al., 2005. A survey and comparison of peer-to-peer overlay network schemes. IEEE Communications Survey and Tutorial 7 (2),72–93.

ilojicic, D.S., Kalogeraki, V., 2002. Peer-to-Peer Computing, HP Labs, Palo Alto, TechReport: HPL-2002-57.

ondal, A., Yilifu, Kitsuregawa, M., 2004. P2PR-tree: an R-tree-based Spatial Indexfor peer-to-peer environments. In: EDBT Workshops, pp. 516–525.

am, B., Sussman, A., 2006. DiST: fully decentralized indexing for querying dis-tributed multidimensional datasets. In: International Parallel and DistributedProcessing Symposium (IPDPS).

ovak, D., Zezula, P., 2006. M-Chord: a Scalable Distributed Similarity Search Struc-ture. InfoScale.

tarmos, N., Pitoura, T., Triantafillou, P., 2006. Range query optimization lever-aging peer heterogeneity in DHT Data Networks. In: Proceedings of the 4thInternational Workshop on Databases, Information Systems, and Peer-to-PeerComputing, pp. 111–122.

kabe, A., Boots, B., Sugihara, K., Chiu, S., 2000. Spatial Tessellations: Concepts andApplications of Voronoi Diagrams, 2nd ed. John Wiley.

renstein, J., Merrett, T.H., 1984. A class of data structures for associative searching.In: Proceedings of the 3rd ACM SIGACT–SIGMOD Symposium on Principles ofDatabase Systems, pp. 181–190.

refuse, 2010. http://prefuse.org.reparata, F.P., Shamos, M.L., 1985. Computational Geometry: An Introduction.

Springer Verlag.ao, A., Lakshminarayanan, K., Surana, S., et al., 2003. Load Balancing in Structured

P2P Systems. IPTPS.atnasamy, S., Francis, P., Handley, M., Karp, R., Shenker, S., 2001. A scalable content-

addressable network. In: Proceedings of the ACM SIGCOMM, pp. 161–172.oussopoulos, N., Kelly, S., Vincent, F., 1995. Nearest Neighbor Queries. SIGMOD.owstron, A., Druschel, P., 2001. Pastry: Scalable, Distributed Object Location and

Routing for Large-scale peer-to-peer Systems. Middleware.ahin, O.D., Antony, S., Agrawal, D., Abbadi, A.E., 2005. PRoBe: Multi-dimensional

Range Queries in P2P Networks. WISE.amet, H., 1990a. Applications of Spatial Data Structures. Addison-Wesley, Reading,

MA.amet, H., 1990b. The Design and Analysis of Spatial Data Structures. Addison-

Wesley, Reading, MA.chmidt, C., Parashar, M., 2003. Flexible information discovery in decentralized

distributed systems. IEEE International Symposium on High Performance Dis-tributed Computing.

hu, Y., Ooi, B.C., Tan, K.-L., Zhou, A., 2005. Supporting multi-dimensional rangequeries in peer-to-peer systems. In: IEEE International Conference on Peer-to-Peer Computing (P2P).

tanoi, I., Agrawal, D., Abbadi, A.E., 2000. Reverse nearest neighbor queries fordynamic databases. In: SIGMOD Workshop on Research Issues in Data Miningand Knowledge Discovery.

toica, I., Morris, R., Karger, D., Kaashoek, M.F., et al., 2001. Chord: a scalable peer-to-peer lookup service for internet applications. In: Proceedings of the ACMSIGCOMM, pp. 149–160.

un, X., 2007. SCAN: a small-world structured P2P overlay for multi-dimensionalqueries. In: Proceedings of the WWW, pp. 1191–1192.

ang, Y.Z., Xu, J.L., Zhou, S.G., Lee, W.-C., 2009. m-LIGHT: indexing multi-dimensionaldata over DHTs. In: Proceedings of the 29th IEEE International Conference onDistributed Computing Systems, pp. 191–198.

anin, E., Harwood, A., Samet, H., 2007. Using a distributed quadtree index in peer-to-peer networks. VLDB Journal 16 (2), 165–178.

ran, D.A., Nguyen, K., 2008. Multidimensional information retrieval in peer-to-peernetworks. In: IEEE International Parallel and Distributed Processing Symposium(IPDPS 2008.

d Software 84 (2011) 2348– 2362

Uhlmann, J.K., 1991. Satisfying general proximity/similarity queries with metrictrees. Information Processing Letters (IPL) 40, 175–179.

Vu, Q.H., Lupu, M., Wu, S., 2009. SiMPSON: Efficient Similarity Search in MetricSpaces over P2P Structured Overlay Networks. In: Proceedings of the 15th Inter-national Euro-Par Conference on Parallel Computing, pp. 498–510.

Wang, J., Wu, S., Gao, H., Li, J., Ooi, B.C., 2010. Indexing multi-dimensional data ina cloud system. In: ACM International Conference on the Management of Data(SIGMOD).

Wei, X., Sezaki, K., 2006. DHR-Trees: a distributed multidimensional indexing struc-ture for P2P systems. In: International Symposium on Parallel and DistributedComputing (ISPDC).

Yang, B., Garcia-Molina, H., 2003. Designing a Super-peer Network, International.In: Conference on Data Engineering (ICDE).

Yang, C., Lin, K.I., 2001. An index structure for efficient reverse nearest neighborqueries. In: ICDE.

Yin, W.K., Zhu, M., Jiang, L., 2009. R-chord: a distributed similarity retrieval systemwith RPCID. In: Proceedings of the IEEE International Conference on NetworkInfrastructure and Digital Content, pp. 393–399.

Yu, X.P., Yu, X.G., 2007. An adaptive algorithm for P2P K-Nearest neighbor search inhigh dimensions. In: Proceedings of the 2007 IEEE International Conference onControl and Automation, pp. 236–241.

Zhang, C., Krishnamurthy, A., Wang, R.Y., 2004. SkipIndex: Towards a Scalable Peer-to-Peer Index Service for High Dimensional Data, Technical Report, TR-703-04,Princeton University.

Zhang, R., Qian, W.N., Zhou, A., Zhou, M.Q., 2009. An efficient peer-to-peer indexingtree structure for multidimensional data. Future Generation Computer Systems– The International Journal of Grid Computing Theory Methods and Applications25 (1), 77–88.

Zhao, B.Y., Huang, L., Stribling, J., Rhea, S.C., et al., 2004. Tapestry: a resilientglobal-scale overlay for service deployment. IEEE Journal on Selected Areas inCommunications 22 (1), 41–53.

Zhuge, H., Chen, X., Sun, X.P., Yao, E., 2008. HRing: a structured P2P overlay based onharmonic series. IEEE Transactions on Parallel and Distributed Systems 19 (2),145–158.

Zuo, H.Y., Ning, J., Deng, Y.D., Luo, C., 2008. CAN-QTree: a distributed spatialindex for peer-to-peer networks. In: Proceedings of the 10th IEEE InternationalConference on High Performance Computing and Communications, pp. 250–257.

Chong Zhang received the B.S. and M.S. degree from National University of DefenseTechnology (NUDT) in 2005 and 2008, respectively. He is currently pursuing thePh.D. degree in the Science and Technology Information Systems EngineeringLaboratory of NUDT, Changsha, China. His current research interests are in spatio-temporal index, P2P computing and data management in P2P networks.

Weidong Xiao received the B.S. and M.S. degree from National University of DefenseTechnology (NUDT) in 1990 and 1998, respectively and obtained his Ph.D. there in2004. He is currently an Professor with Information resource management and alsoa supervisor of Ph.D. candidates in the Science and Technology Information SystemsEngineering Laboratory of NUDT. He is a director of Chinese Institue of ManamentScience and Engineering. His current research interests are in data management inP2P networks, spatio-temporal index and Internet of Things (IoTs).

Daquan Tang received the B.S. and M.S. degree from National University of DefenseTechnology (NUDT) in 1994 and 2002, respectively. He is currently an Professorat College of Information System and Management of NUDT, where he obtainedhis Ph.D. in 2009. He is the senior member of the Chinese Institue of Electronics. Hiscurrent research interests are in data management in P2P networks, spatio-temporaldata management.

ogy (NUDT) in 2000. He is currently an Associate Professor at College of InformationSystem and Management of NUDT, where he obtained his Ph.D. in 2006. His currentresearch interests are in topology control in P2P, P2P computing, spatio-temporaldata management.