the top-kskyline query in pervasive computing...

The Top-k Skyline Query in Pervasive ComputingEnvironments

Peng Pan, YuQing Sun,Qingzhong Li, ZhiYong Chen, Ji BianSchool of Computing Science and Technology

ShanDong UniversityJiNan, P.R.China

ppan, sunyuqing, lqz,chenzy,[email protected]

Abstract-In pervasive computing environments, more and moreapplications or platforms based on mobile phone and PDAemerged. People could get what they want from these platformsanywhere. Top-k and skyline query are two methods to satisfyuser's preferences. In many situations, we need to combine thetwo querying technologies to satisfy the user's preferences, whichis called top-k skyline query. Based on a given pervasivecomputing scenario, we formulate a theoretical model for top-kskyline query, describe an algorithm that proceeds the skylineand top-k query synchronously. It shows better performance inexperiment than algorithms proposed in [3].

Keywords- Pervasive computing, skyline, top-k, top-k skyline

I. INTRODUCTION

Recently, pervasive computing is getting more and moreresearchers' attentions with the development and maturationof related technologies and theories gradually[1]. The ultimategoal of pervasive computing is to integrate the informationspaces constructed by communication and computer and tointegrate the physical spaces for people's lives and works [2].

With the development of mobile technology, more andmore applications or platforms based on mobile phone andPDA emerged. People would get what they want from theinformation services provided by these platforms whensubmitting the query according to their preferences. To date,there are two major methods for query based on user'spreferences: top-k and skyline. They are both used to solve thetraditional problem of multi-objects optimization. Top-k queryretrieves the best k objects that minimize a specific preferencefunction, which transform the problem of the multi-objectsoptimization into single object one. For skyline, given a set ofd-dimensional objects, the query returns the "best" objects thatare not dominated by others. Differed from the top-k query, theskyline solves the original multi-objects with algorithms ofmulti-objects optimization. As the result is a set whoseelements are not dominated mutually and the amount ofthe setmay be large, it's unfeasible and worthless to present all theresults to the users. We may refme the skyline results by a topk query. An example is as follow:Example 1:

978-1-4244-5228-6/09/$26.00 ©2009 IEEE 335

A tourist is interested in the top-k restaurants with the bestfood, and these restaurants are the closest to certain spot andthe cheapest.

The tourist can formulate the following top-k query:Select *From GuideSkyline of cheap(Price) max, close(Address, HotelAd)

maxOrder By max(quality(Food))Topk

Example 1 is a combination of skyline and top-k query.The function cheap is a score function, which indexes therestaurants on prices; close is an user-defmed function whichorders the distances between a certain spot and the restaurants;quality is an user-defmed function which evaluates the foodquality for each restaurant. In the process, the query executesthe operator 'Skyline' to get the skyline results, and thenexecutes the 'Order By... Top k' to get to the fmal results. [3]has proposed a algorithm for top-k skyline query. However, itis inefficient to separate the processes of skyline and top-kquery into two isolated steps,. In this paper, we present a newalgorithm combining the processes of skyline and top-k intoan integrated process, and the experiments show betterperformance than the one in [3].

The paper is organized as follows. Section 2 describes ascenario in computing pervasive environment which motivatesour research. Section 3 discusses related works. Section 4defmes the theoretical model for top-k skyline. Based on thescenario mentioned in section 2, section 5 describes analgorithm of top-k skyline query. Section 6 presents theexperiments, and section 7 concludes.

II. MOTIVATING SCENARIO

To illustrate the research focus and motivate the need forour discussion, consider the following scenario:

Sams is traveling to JiNan which is famous for the splendidsprings in China. He wants to fmd a hotel with the best food,and these hotel are close to a famous spring - BaoTu springand cheap. Using his PDA, he submits his preferences byvisiting a mobile travel service platform. After executing thetop-k skyline query operator as Example 1, the platform

A point Pi is a skyline point m S, where Pi satisfies-,3p; : p; :;:Pi A Pj dominating Pi

Definition 3: scoring functionThe scoring function s on relation R is a function from R toreal numbers, i.e, R~ ~ . The function s induces apreference relation >- s and an indifference relation - s on R.For any two different tuples t, and tjfi >- sfj iff s(ti) > s(~)

It - sfj iff s(ti) = s(tj )Definition 4: the set for top-k query on RGiven a relation R, a non-negative integer k, a scoring functions, a set for top-k query satisfying s on R is following tuples setT:IT~R

2 if'[Rl < k, T = R , otherwise 111 = k3 VteT,Vt'eR-T,t>- s t'or f- s t'

Definition 5: the set for top-k query on skyline querySP is a skyline set in terms of defmition 2, k is a non-negativeinteger, andfis a scoring function on SP, a set for top-k querysatisfyingfon SP is a tuples set SPT as follows:1 SPT~ SP

2 if ISPI < k, SPT = SP , otherwise ISPl1 = k3 'list e SPT,'list' e SP-SPT,st >- f st ' or sf - f st '

Figure 2 is an example of the model whose dimensions ofrelation R are 2. The Pi is a skyline point, and qi is thecorresponding point in term of a scoring function f. Theobjective is to find the top k point satisfying the scoring fwhose parameter is in the domain of skyline points.

Figure 2. An exampleof model for top-k skylinequery

x

y

z

........ ......................................................................

...- --.-..- --- - - -----..

., repository i =;;; ·,1 ~~iJ=~ 'DA

· ·Q~~;:;;i~gj;;;.. server ....J

In Figure 1, the Mobile web server provides a portal to thePDA, transfer the user's preference to top-k skyline queryengine, and presents the results to user. Based on the datarepository, the query engine executes the query operatorsincluding the top-k skyline algorithm depicted in this paper.

Figure 1.The Architecture of the mobileinformation serviceplatform

presents the user results. Figure 1 presents the architecture forthe scenario.

III. RELATED WORKS

In the literature of query answering algorithms proposed fortop-k queries, the Threshold Algorithm (TA) is arepresentative[4] algorithm. It is generally applicable indatabase applications, but inefficient in large distributednetworks. Therefore, several variations of TA have beenproposed, such as the Three-Phase Uniform-Thresholdalgorithm (TPUT)[5], BPA and BPA2[6].

The term skyline has been proposed in the databaseliterature [7] to refer to the secondary storage version of themaximal vector set problem. Since then, several algorithmshave been proposed for skyline queries that include nopreprocessing solutions (e.g., [8]), presorting solutions (e.g .,[9]), and index-based solutions (e.g., [10], [11]) . The partitionindex algorithm presented in [11] motivated this paper.

Recently, solutions for the combination ofskyline and topk have been defined[12], [13], [14]. In general, these solutionscalculate the first stratum or skyline with some sort of postprocessing. However, without considering situations as insection 2, these solutions do not identify the k best answers.The authors of [15], [16] proposed the problem of selecting kskyline points so that the number of points, which aredominated by at least one of these k skyline points, ismaximized. The most related works to this paper areinvestigated in [3], [17]. The authors attempt to solve the top-kskyline problem by proposing the models and algorithms.

IV. THE MODEL FOR TOP-K SKYLINE QUERY

Given a space S with d dimensions, S = {Sj,S2 •.• .Sd} , a set D of

points in S, D = {p t , . ..Pn}, i.e, each Pi E D is a point with d

dimensions in S. We denote the value for the ith dimension ofpoint pi by Pi-sj.For each Sj , we assume there is a total orderdenoted by >- j in its value of domain, where >- j can be the

relationship of'<' or '>' in terms of user's preference.Definition 1: dominateA point Pi is said to dominate another point Pj on S if and onlyif VSk e S .tn.s« >- pi.s« , and 3s1 e S .pi st> p.st

Definition 2 : skyline,SP(D,S)

V. ALGORITHM FOR TOP-K SKYLINE QUERY

A. An Example ofData Instances for Algorithm

In the data repository related to the user's query of thescenario in section 2, the original tuples are in the list of Table1. To apply the skyline algorithm proposed in this paper, weneed to normalize the values in each dimension. We set theranks of each dimension value as given in Table 1. Thedimension values are transformed by substituting the real valuewith its rank among all values. Table 2 is the result of thetransformation.

336

Table 1. An Example of Data Instances Table 2. Result of Transformation

B. Algorithm f or Top-k Skyline Query

The algorithm partitions the 2-dimensional tuples into twolists in the way that tuple t = (P1, ..Pd) is put into list i if the ithdimension value of tuple t is the minimum among alldimensions of t ,that is Pi -;;, pNi ~ l Id * j . For example ,

tuple c("GuiDu") is put into listl, since its normalizeddimension values are (5,8). Table 3 is the final lists. The tuplesin the two lists can be indexed by any dimensional indexingstructure. In this paper, we use B+-tree to index the tuplesshown in Figure 3.

Table 3 Final Lists

VI. EXPERIMENT

We now present experimental results to illustrate thediscussion in Section 5. Experiments were performed on a2GHz Pentium IV Windows desktop with I GB of RAM and60 GB of hard disk space. The algorithm in Section 5 was

getNextNode(t) returns the next node for some node in aB+-tree,

computePartitionSkyline(P) is used to compute the skylinesets of the nodes set P, which may adopt any algorithm ofskyline at present ,

ComputeNewSkyline(Sj ,S) computes the new skyline fromSjandS,

comp uter Top-k gets the top-k results from existing skylineset.I Input : Dataset D

2 Input : B+-tree B

3 Output: Skyline S

4 furi=ltod

5 {+-True

6 1;.+-traverseTreeForMin(root,i)

7 max[i) +-maxv.u.ue(\)

8 minlil +-minValue(1;.)

9 mnf- minf-l max[i]

10 mXf- min1.1min[i ]

II fori=1 to d

12 iflftJl< max[i)

13 {+- False

14 j +-1

15 Sf- 0

16 Tf- 017 wkile there are some partitions to be searched

18 furi=1 to d

19 ifmin[il == mx

20 Pj +-1;.

21 S (,- 022 1;.+-getNextNode(\)

23 wkile (minv.u.ue(\) == mx)

24 mJl+- miJt(mJl, maxValue(\»

25 Fjf- !'J·U t.;

26 1;.+-getNextNode(\)

27 min[i) +- minValue(\)

28 Sj +- computePartitioJlSkyline(Pj)

29 S f- Su computeNewSk;yline(::{j, If)

30 T +- cOlllpHisrTop-k(S)

31 iflTI == k

32 eJld

33 j+-j+1

34 mx f- min1_1 min[i]35 furi=1 to d

36 if IftJl < miJl[i)

37 {+- False

Figure 4. the Algorithm for Top-k Skyline Query

Narre sinnPrice Dsrarce Price Dsarce(RMB) (raj (rart) (rart )

Stute TIaJlXi a 668 1380 1 5SollelSilverPlaza h 1725 1411 16 6GUDu c 620 1996 5 8GtiheCro\WHoIi;!a d 1350 1068 IS I1.oll<Du e 1136 2818 12 ISGoodFrierd f 897 1949 II 7Nanao 580 2212 4 12Cilv.fS_ h 667 1169 6 2RttG i 199 2410 I 14S1llrdo... BW:I.... 1280 2284 14 13

QiIliWe' ''' k 1160 1354 13 4Ne\\sEkildile I 860 2034 10 10zro_. . 830 2068 9 IISluiJeI>tvilla n 720 3101 8 16Ya"" 0 238 1998 2 9YlI:)wn 580 1210 3 3

Figure 3. B+-tree for the Lists of Table 3

Listl List2

PointMinimum

PointMinimum

distance distancei 1 d 10 2 h 2Ip 3 k 4

4. 5c 5 b 6n 8 f 7m 9 . 10

I 1014

Name -Price Distance(RMB) (m)

uiheCrownHolidav d 1350 1068:itvQf SorilU!. h 667 1169

YuQuan 580 1210hui Weiii~ k 1160 1354

Shunbe Tianxi a 668 1380Sot kelSilverPlaza h 1725 1411fGood Friend f 897 1949

uiDu c 620 1996Yavue 0 238 1998

ewsBuildin I 860 2034Zhont Hao , 830 2068

eniieo 580 2212Shandool!.Buildin 1280 2284Ru"ia i 199 2410too Du e 1136 2818ShW1Gen. ~na n 720 3101

The algorithm proposed in [3] divides the process into twounattached steps: skyline query and top-k query. On the resultsof the whole set of skyline, the algorithm run top-k query to getthe fmal results, but it shows low efficiency. The algorithm inthis paper may get the final top-k skyline results withouthaving to scan all the data spaces, and by normalizing thevalues in each dimension, it processes the skyline query andtop-k query synchronous. As soon as the k top-k skyline tuplesemerge, the query process would be ended, and it'sunnecessary to travel the whole skylines set. Figure 4 is thedetailed algorithm where .

fi is a flag to determine whether the ith dimension stillneeds to be searched . If fi is set to false, it means that allsubsequent tuples are dominated by some tuple, and the ithdimension needs not to be searched any more,

maxValue(t) returns the maximum value among alldimensions of tuple t, which satisfiests. ~ t.Sj(j E [I,d] /\ j * i) ,

min Valuett) returns the minimum value among alldimensions of tuple t, which satisfiestsi -;;, t.sj(j E [I,d] /\ j * i) ,

traverseTreeForMin(root,i) traverses the B+-tree to obtainthe tuple with the minimum value for the ith dimension,

337

Figure 5. the Experiment for B-algorithm and N-aglorithm

VII. CONCLUSION AND FUTURE WORKS

This paper has discussed the problems related to thecombination of top-k query and skyline query includingtheoretical model and algorithm. Based on a given scenario inpervasive computing environment, we have formulated atheoretical model for top-k skyline query, demonstrated thealgorithm. Finally, we evaluated the algorithm which showsbetter performance.

Many other problems need to be solved and furthered. Weneed to fmd better way to improve the algorithm's performance.

implemented in C, and it accesses database through cursoroperations.

The objective of the experiment is to compare theexecuting efficiency of the top-k skyline algorithm in thispaper with the one proposed in [3] and to reveal the effect ofdifferent numbers ofdimensions on execution time.

We take 10000 tuples as tested tuple vectors, and run twoalgorithms respectively by using different parameter values ofdimensions from I to 10. Figure 5 is the results based on theexperiment. The curve linked by triangle points which isrepresented by B-algorithm presents the performance of thealgorithm in [3], the curve with square points labeled by Naglorithm shows the performance of this paper's algorithm.

Since increasing the number of dimensions would producemore skyline tuples than lower dimension, more times wouldcost on proceeding the skyline tuples. Moreover, when thenumber of dimensions is up to certain point, the efficiency ofboth two algorithms will decrease rapidly. However, thealgorithm proposed in this paper need not to scan all the tuplesto get the final top-k result , which shows better performancethan the algorithm proposed in [3] does.

[I] Mark Weiser, "The Computer for the 21st Century", ScientificAmerican. Sep. 1991,265(3): 94 - 104.

[2] M.Satyanarayanan, "Pervasive Computing: Vision and Challenges",IEEE Personal Communications, Aug. 2001,vol. 8, no. 4, pp. 10-17.

[3] Brando C, Goncalves M, Gonzalez V, "Evaluating top-k skylinequeries over relational databases", In Proceedings of InternationalConference on Database and Expert Systems Applications. San Jose,USA: Springer, 2007. 254-263

[4] R.Fagin, A.Lotem, M.Naor, "Optimal aggregation algorithms formiddleware Source", in Proceedings of the ACM SIGACT-SIGMODSIGART Symposium on Principles of Database Systems, 2001, pp. 102113.

[5] Pei Cao, Zhe Wang, "Efficient Top-K query calculation in distributednetworks", in Proceedings of the 23rd Annual ACM Symposium onPrinciples of Distributed Computing, 2004, pp. 206-215.

[6] R.Akbarinia, E.Pacitti, P.Valduriez , "Best position algorithms fortop-k queries", in Proceedings of the 33rd international conference onVLDB2007,pp.495-506.

[7] S. Bdrzsdnyi, D. Kossmann, and K. Stocker, "The Skyline Operator,"in Proceedings of the International Conference on DataEngineering(ICDE),2001 ,pp.235-254

[8] P. Godfrey, R. Shipley, and J. Gryz, "Maximal Vector Computationin Large Data Sets," in Proceedings of the International Conference onVery Large Data Bases, VLDB, 2005.

[9] 1. Chomicki, P. Godfrey, 1. Gryz, and D. Liang, "Skyline withPresorting," in Proceedings of the International Conference on DataEngineering(ICDE), 2003, pp. 717-719

[10] D. Kossmann, F. Ramsak, and S. Rost, "Shooting Stars in the Sky:An Online Algorithm for Skyline Queries," in Proceedings of theInternational Conference on Very Large Data Bases, VLDB, 2002,pp.275 - 286

[11] K.-L. Tan, P.-K. Eng, and B. C. Ooi, "Efficient Progressive SkylineComputation," in Proceedings of the International Conference on VeryLarge Data Bases, VLDB, 2001,pp. 301 - 310 .

[12] Balke, W-T., G''untzer, U. "Multi-Objective Query Processing forDatabase Systems", In Proceedings of the International Conference onVery Large Databases (VLDB), pp. 936-947

[13] Goncalves, M., Vidal, M.E, "Preferred Skyline: A Hybrid ApproachBetween SQLf and Skyline", In Andersen, K.V., Debenham, J., Wagner,R. (eds.) DEXA 2005. LNCS, vol. 3588, pp. 375-384. Springer,Heidelberg (2005)

[14] Lo, E., Yip, K., Lin, K-I., Cheung, D, "Progressive Skylining overWeb-Accessible Databases", Journal of Data and KnowledgeEngineering 57(2), pp.l22-147 (2006)

[IS] X. Lin, Y. Yuan, Q. Zhang, Y Zhang. "Selecting Stars: The kMostRepresentative Skyline Operator", In Proc. of the Int. IEEE Conf.on Data Engineering (ICDE), Istanbul, Turkey, 2007,pp. 86-95

[16] Man Lung Yiu, Nikos Mamoulis, "Efficient Processing of Top-kDominating Queries on Multi-Dimensional Data", vldb07,pp.483-494

[17] Goncalves, M., Vidal, M.E., ''Top-k Skyline: A Unified Approach",In Proceedings ofOTM (On the Move) 2005 PhD Symposium, pp. 790799 (2005)1. Clerk Maxwell, A Treatise on Electricity and Magnetism,3rd ed., vol. 2. Oxford: Clarendon, 1892, pp.68-73.

975

Number of'dirre nsions

3

--.- B-algorithm

___ N-algorithm

140

120

100

'" 80I 60

40

20

o

REFERENCES

338

the top-kskyline query in pervasive computing...

Documents