complex spatio-temporal pattern queries

12
Complex Spatio-Temporal Pattern Queries Marios Hadjieleftheriou, George Kollios Computer Science Department Boston University {marioh, gkollios}@cs.bu.edu Petko Bakalov, Vassilis J. Tsotras Computer Science Department University of California, Riverside {pbakalov, tsotras}@cs.ucr.edu Abstract This paper introduces a novel type of query, what we name Spatio-temporal Pattern Queries (STP). Such a query specifies a spatio- temporal pattern as a sequence of distinct spatial predicates where the predicate tempo- ral ordering (exact or relative) matters. STP queries can use various types of spatial pred- icates (range search, nearest neighbor, etc.) where each such predicate is associated (1) with an exact temporal constraint (a time- instant or a time-interval), or (2) more gen- erally, with a relative order among the other query predicates. Using traditional spatio- temporal index structures for these types of queries would be either inefficient or not an applicable solution. Alternatively, we pro- pose specialized query evaluation algorithms for STP queries With Time. We also present a novel index structure, suitable for STP queries With Order. Finally, we conduct a compre- hensive experimental evaluation to show the merits of our techniques. 1 Introduction Spatio-temporal data management has received a lot of attention recently, mainly due to the emergence of location based services and advances in telecommuni- cations (cheap GPS devices, ubiquitous cellular net- works, RFIDs, etc.) As a result, large amounts of spatiotemporal data is produced daily, typically in the form of trajectories. The need to efficiently analyze and query this data requires the development of so- Permission to copy without fee all or part of this material is granted provided that the copies are not made or distributed for direct commercial advantage, the VLDB copyright notice and the title of the publication and its date appear, and notice is given that copying is by permission of the Very Large Data Base Endowment. To copy otherwise, or to republish, requires a fee and/or special permission from the Endowment. Proceedings of the 31st VLDB Conference, Trondheim, Norway, 2005 phisticated techniques. Previous research has concen- trated on various spatio-temporal queries, mainly fo- cusing on range searches and nearest neighbor varia- tions [18, 19, 9, 16, 14], or mining tasks like extracting patterns and periodicities from spatiotemporal trajec- tories [17, 11]. This paper introduces a novel problem, what we term Spatio-temporal Pattern Queries (STP). Given a large collection of spatiotemporal trajectories, an STP query retrieves all trajectories that follow user defined movement patterns in space and time. There are many practical applications where STP queries appear. For example, “Identify all vehicles that were very close to all three sniper attacks in Maryland (the locations and times of the attacks are known)” or “Locate products that left the factory a month ago, were stored in one of the warehouses near the dock, and loaded on a ship (locations can be tracked by us- ing RFIDs)”. While there have been various previ- ous works addressing the problem of pattern discovery [17, 11], to the best of our knowledge, there has been no previous work on the orthogonal problem, the STP queries. We represent a spatio-temporal pattern as a se- quence of distinct spatio-temporal predicates, where the temporal ordering (exact or relative) of the pred- icates matters. STP queries can have arbitrary types of spatial predicates (e.g., range search, nearest neigh- bor, etc.), where each predicate may be associated (1) with an exact temporal constraint that is either a time- instant or a time-interval (STP Queries With Time), or (2) more generally, with a relative order (STP Queries With Order). An STP query With Time example is: “Find ob- jects that crossed through region A at time T 1 , came as close as possible to point B at a later time T 2 and then stopped inside circle C some time during interval (T 3 ,T 4 )”. An STP query With Order is: “Find ob- jects that first crossed through region A, then passed as close as possible from point B and finally stopped inside circle C”. Here only the relative order of the This work has been supported by NSF grants IIS-0133825, IIS-0308213, and IIS-0330481. 877

Upload: independent

Post on 11-Nov-2023

0 views

Category:

Documents


0 download

TRANSCRIPT

Complex Spatio-Temporal Pattern Queries

Marios Hadjieleftheriou, George KolliosComputer Science Department

Boston University

{marioh, gkollios}@cs.bu.edu

Petko Bakalov, Vassilis J. TsotrasComputer Science Department

University of California, Riverside

{pbakalov, tsotras}@cs.ucr.edu

Abstract

This paper introduces a novel type ofquery, what we name Spatio-temporal PatternQueries (STP). Such a query specifies a spatio-temporal pattern as a sequence of distinctspatial predicates where the predicate tempo-ral ordering (exact or relative) matters. STPqueries can use various types of spatial pred-icates (range search, nearest neighbor, etc.)where each such predicate is associated (1)with an exact temporal constraint (a time-instant or a time-interval), or (2) more gen-erally, with a relative order among the otherquery predicates. Using traditional spatio-temporal index structures for these types ofqueries would be either inefficient or not anapplicable solution. Alternatively, we pro-pose specialized query evaluation algorithmsfor STP queries With Time. We also present anovel index structure, suitable for STP queriesWith Order. Finally, we conduct a compre-hensive experimental evaluation to show themerits of our techniques.

1 Introduction

Spatio-temporal data management has received a lotof attention recently, mainly due to the emergence oflocation based services and advances in telecommuni-cations (cheap GPS devices, ubiquitous cellular net-works, RFIDs, etc.) As a result, large amounts ofspatiotemporal data is produced daily, typically in theform of trajectories. The need to efficiently analyzeand query this data requires the development of so-

Permission to copy without fee all or part of this material isgranted provided that the copies are not made or distributed fordirect commercial advantage, the VLDB copyright notice andthe title of the publication and its date appear, and notice isgiven that copying is by permission of the Very Large Data BaseEndowment. To copy otherwise, or to republish, requires a feeand/or special permission from the Endowment.

Proceedings of the 31st VLDB Conference,Trondheim, Norway, 2005

phisticated techniques. Previous research has concen-trated on various spatio-temporal queries, mainly fo-cusing on range searches and nearest neighbor varia-tions [18, 19, 9, 16, 14], or mining tasks like extractingpatterns and periodicities from spatiotemporal trajec-tories [17, 11]. This paper introduces a novel problem,what we term Spatio-temporal Pattern Queries (STP).Given a large collection of spatiotemporal trajectories,an STP query retrieves all trajectories that follow userdefined movement patterns in space and time.

There are many practical applications where STPqueries appear. For example, “Identify all vehicles thatwere very close to all three sniper attacks in Maryland(the locations and times of the attacks are known)” or“Locate products that left the factory a month ago,were stored in one of the warehouses near the dock,and loaded on a ship (locations can be tracked by us-ing RFIDs)”. While there have been various previ-ous works addressing the problem of pattern discovery[17, 11], to the best of our knowledge, there has beenno previous work on the orthogonal problem, the STPqueries.

We represent a spatio-temporal pattern as a se-quence of distinct spatio-temporal predicates, wherethe temporal ordering (exact or relative) of the pred-icates matters. STP queries can have arbitrary typesof spatial predicates (e.g., range search, nearest neigh-bor, etc.), where each predicate may be associated (1)with an exact temporal constraint that is either a time-instant or a time-interval (STP Queries With Time), or(2) more generally, with a relative order (STP QueriesWith Order).

An STP query With Time example is: “Find ob-jects that crossed through region A at time T1, cameas close as possible to point B at a later time T2 andthen stopped inside circle C some time during interval(T3, T4)”. An STP query With Order is: “Find ob-jects that first crossed through region A, then passedas close as possible from point B and finally stoppedinside circle C”. Here only the relative order of the

This work has been supported by NSF grants IIS-0133825,IIS-0308213, and IIS-0330481.

877

spatial predicates is important, independently of whenexactly they occur in time.

The straightforward approach for answering STPqueries is to evaluate the query pattern on all trajec-tories one by one using a linear scan on a sequentiallystored archive. Clearly, this approach has prohibitivecost and in some cases might be infeasible due to verylarge database sizes and due to the expensive nature ofthe distance function used for STP queries With Or-der (as will be seen later on). When considering STPqueries With Time, another straightforward approachwould be to use a traditional spatio-temporal indexstructure [18, 19, 21, 9] to index the trajectories andutilize the index to evaluate the query predicates in-dividually; respective answers can be combined in theend. When these STP queries consist only of range spa-tial predicates such a solution might work well in var-ious practical cases. Nevertheless, this approach doesnot work for spatial predicates that need to be eval-uated conjointly, like nearest neighbors. Consider thefollowing example: “Find the object trajectory thatcrossed as close as possible from point A at time T1

and then, as close as possible from point B at a latertime T2”. Individually evaluating each predicate can-not provide the trajectory that minimizes the distancefrom both points. Alternatively, we show how to adaptthe traditional best first search approach with a com-bined distance function (e.g., the sum of distances ofeach trajectory from the query points) to answer STPqueries With Time. With careful evaluation strategieswe can guarantee that each needed page from the in-dex is loaded only once, during the evaluation of allpredicates.

Nevertheless, traditional spatio-temporal indexstructures would be a very inefficient solution for an-swering STP queries With Order. Since only the rel-ative order of the predicates is significant, the spatio-temporal index will have to retrieve the whole temporalevolution of each object. For such queries, we proposea novel indexing technique that enables efficient eval-uation by storing ‘order’ information inside the index.

To summarize, the contributions of this paper arethe following: (1) We introduce and formalize a novelquery type (STP) that combines general spatial pred-icates with temporal constraints (STP queries WithTime) and/or relative ordering (STP queries With Or-der). (2) We propose specialized query evaluation al-gorithms for these two types of STP queries, as well asa novel index structure, suitable for STP queries WithOrder. (3) Finally, we present an extensive experimen-tal evaluation of the proposed techniques.

2 Problem Definition

Consider a large archive of object trajectories and awell-defined trajectory representation such that the lo-cation of the objects can be computed for any giventime (the framework that will be presented is indepen-dent of the underlying trajectory representations). An

D1

D2

T

rP

12

qq

X

Y

2t

t

1t

4t

3

Figure 1: An example STP query With Time.

STP query is expressed as a sequence Q of arbitrarylength m of ordered spatio-temporal predicates of theform

Q = {(Q1, T1), (Q2, T2), . . . , (Qm, Tm)}where in each pair (Q,T ), Q represents a spatial pred-icate and T a temporal constraint. For simplicity butwithout loss of generality, in the rest, Q will express ei-ther a range (R) or a nearest neighbor (NN) query. Aswill become clear, our framework can be adapted easilyto handle other spatial predicates as well. T is eithera time-instant (t), a time-interval (∆t) or empty (∅).Inherently the temporal constraints impose a strict or-dering on the spatial predicates. When a temporalconstraint is empty, ordering will be implied by the ac-tual position of the associated predicate in the querysequence. An alternative query expression mechanismappeared in [3], where regular expressions were used torepresent mobility patterns. More details and limita-tions of this approach appear in the related work.

We say that a trajectory satisfies an STP query if itsatisfies all spatio-temporal predicates at once:

• A range predicate is trivially, and individually,satisfied by a trajectory if the object is containedinside the given range anytime during the specifiedtemporal constraint.

• A nearest neighbor predicate is satisfied only withrespect to all other NN searches as well. We saythat a trajectory satisfies the NN predicates if itminimizes the sum of the distances from thosepredicates during the given temporal constraints.

An example STP query With Time is shownin Figure 1. The query depicted is Q ={(NN(q1), t1), (NN(q2), t2), (R(r), [t3, t4))}, and issatisfied by the trajectory that minimizes the sum ofdistances from points q1, q2 at times t1, t2, respectively,and crosses region r anytime between [t3, t4).

3 STP Query Algorithms

For ease of exposition we first present our solutions forSTP Queries With Time, i.e., STP queries that con-tain spatial predicates with non-empty temporal con-straints. Subsequently, we discuss solutions for themore general STP Queries With Order, i.e., queriesthat are ordered sequences of spatial predicates with-out temporal constraints.

878

X

Q

Q1

P

T

Y

2

DLBUBD

Figure 2: Approximating a trajectory with multipleMBRs.

3.1 STP Queries With Time

Given that these queries contain both spatial and tem-poral constraints, they can be answered using special-ized evaluation strategies on existing spatio-temporalindex structures. The spatio-temporal index can serveas a filtering step that will reduce the number of ac-cesses to raw trajectory data by pruning trajectoriesthat do not satisfy the pattern and improving queryperformance (against the linear search) by orders ofmagnitude.

For simplicity we adopt a general trajectory index-ing scheme. Nevertheless, the algorithms that will bepresented do not make any assumptions about the un-derlying index, as long as it can answer efficiently near-est neighbor and range queries . Without loss of gener-ality, we assume that the trajectories are approximatedusing a large number of Minimum Bounding Rectan-gles [19, 9], which are then indexed using a spatio-temporal index structure like the R-tree [8, 18] or theMVR-tree [21, 9]. With minor modifications to ourframework, other approximation techniques are appli-cable as well. Every MBR inserted in a leaf level of thetree is associated with the identifier of the trajectorythat it belongs to, and bounds a small time-interval ofthe object’s movement history. An example of this in-dexing scheme is shown in Figure 2, where a trajectoryhas been approximated using a total of three MBRs.

For the rest of this section we assume that the MBRsare indexed using a secondary spatio-temporal indexstructure, with data entries pointing to the raw trajec-tory data on disk. Raw data is stored on sequentialdata pages per trajectory.

Among STP queries with time, we first discussqueries that contain multiple range predicates, sincethey can be evaluated in a straightforward way. Wethen consider queries with multiple NN predicatesand present various algorithmic approaches to evalu-ate them. Finally we discuss queries that consist ofcombinations of range and NN predicates.

3.1.1 Range Predicate Evaluation

Assuming a pattern query that contains only rangespatio-temporal predicates, there are two obvious eval-uation strategies to consider. The first approach pro-duces the candidate results of all predicates concur-rently using the index, and then loads from storageonly the trajectories that belong to the intersection ofthe partial answer sets. The second strategy evaluatesthe most selective predicate first (assuming selectivityinformation is available), then the raw trajectory data

is loaded from storage and the rest of the range predi-cates are answered in main memory, using the retrieveddata. Which approach is better depends on the actualselectivities of the queries and the size of the trajecto-ries.

3.1.2 Nearest Neighbor Predicate Evaluation

Consider an STP query that contains only nearestneighbor predicates. The basic idea behind our ap-proach is to use the index structure and the trajectoryMBRs to compute approximate object distances thatwill enable fast pruning of many trajectories, withouthaving to load the raw trajectory data. Our techniqueruns one best-first-search algorithm per query pred-icate that returns, successively, the object with thesmaller distance from that predicate. The total dis-tance of a single trajectory from the query can be com-puted as the sum of individual distances from all pred-icates. Intuitively, by utilizing the trajectory MBRswe can compute both upper and lower-bounds of theactual distance of a trajectory from a given predicate,as shown in Figure 2. By combining approximate dis-tances for all query predicates we will be able to prunetrajectories that have lower-bounding distances largerthan any known upper-bound.

A simple 1-dimensional example is shown in Fig-ure 3(a). This STP query is expressed as Q ={(NN(0), 1), . . . , (NN(0), 5)}. Conceptually, it can bethought of as a time-interval nearest neighbor pattern:“Locate the object that stays closer to the origin dur-ing time-interval [1, 5]”. Trajectory P1 is the answer tothis query since it minimizes the sum of distances fromall five points, with distance D(Q,P1) = 5.5. Trajec-tory P3 does not qualify since it partially intersects thequery lifetime and has infinite distance for some time-instants. Using this simple example, we will illustratetwo evaluation strategies, called lazy and eager, andwill discuss their advantages.

������������������������

������������������������

����������������������������

����������������������������������������

����������������������������������������

�����������������������������

���������������

x

D 3 1LB (q , P )4D(q , P )1

t t

x

42154321

2P

1P

1P

Q

2

Q

P

1−NN 2−NN

(a) (b)

−2

P3

P

2

1

0

−1

3

53

Figure 3: (a) Three trajectories and an STP query. Foreach query point the 1-NN MBRs (solid gray) are retrieved.(b) The MBRs belonging to P1 cover the query, while thoseof P2 do not cover points 4 and 5. Given D(Q, P1) = 5.5and LBD(Q, P2) ≥ 5.7, trajectory P2 can be pruned (anymissing MBRs for points 4 and 5 should be at least as faras D(q4,5, P1) = 1 unit from the query).

Let D(Q, P ) denote the actual distance of trajec-tory P from Q. Also, let LBD(Q, P ) (UBD(Q, P ))denote a lower-bound (upper-bound) distance of P

879

from Q computed by using the distances of the MBRapproximations of P from the query predicates. Athreshold Λ needs to be computed so that all trajec-tories with LBD(Q, P ) > Λ can be pruned. In orderto achieve this we incrementally locate the 1-NN, 2-NN, etc. MBRs individually for each query predicate.These MBRs should also contain the predicate in thetemporal dimension. After a number of MBRs havebeen reported per qi, for every discovered trajectoryP (i.e., a trajectory for which at least one MBR hasbeen reported) there are two cases: (1) the union ofP ’s MBRs contain all query points in the time dimen-sion and we say that P covers the lifespan of Q, or (2)some points of Q are not covered. For example, in Fig-ure 3(a) the 1-NN MBRs for each point are reportedfirst (solid gray rectangles). At this step, no discoveredtrajectory MBRs cover all query points. In Figure 3(b)the 2-NN MBRs are retrieved. This time, the MBRsbelonging to trajectory P1 cover all query predicates.

In the first case both LBD(Q, P ) and UBD(Q, P )can be computed without having to access the rawtrajectory data. The upper-bound can be used as apruning threshold Λ. The lower-bound can be usedto prune the trajectory according to an already com-puted Λ. In the second case, a pessimistic approx-imation of LBD(Q, P ) can be computed. For eachquery predicate qi that is covered by P the partiallower-bounding distance is equal to D(qi, P ). For thequery points that are not covered by P the maximumsuch distance D(qi, P

′) is used, regardless of whichtrajectory it corresponds to. Referring back to Fig-ure 3(b), points 4 and 5 are not covered by P2, thus:LBD(Q, P2) ≥

∑3i=1 D(qi, P2)+D(q4, P1)+D(q5, P1).

Due to the incremental discovery of trajectory MBRs,the computed approximation is still a lower-bound ofthe actual distance. By discovering trajectory MBRsincrementally and continuously improving the com-puted bounds, the number of raw trajectory data thatneed to be loaded from storage is reduced substantially.

δt

4

q1

q2

q3

0 2 3 5 6 T0

1

2

3

4

X

}

1

q A B1−NN1

Figure 4: Evaluating time-interval predicates.

In order to evaluate time-interval temporal predi-cates, for every query predicate qi we need to locate thenearest MBRs considering all time-instants containedin the given interval. An example is shown in Figure4, for δt = ±1. In order to perform correct pruning,the value of D(q, P ) should be set to the closest MBRto qi in ∆t. This MBR is the one that minimizes thedistance from the multi-dimensional region outlined bythe query predicate when conceptually sweeping time-

Algorithm 1 Lazy Nearest Neighbor STP queriesWith TimeInput: Query Q = {(NN(q1), t1), . . . , (NN(qn), tn)}Output: Nearest neighbor PB

1: Structure S ← ∅ , Set U ← ∅, PQ1,...,n ← ∅2: Dq1,...,qn = 0, Λ =∞, k = 1, PB = ∅3: Initialize priority queues PQi

4: while true do5: ConcurrentBestFirstSearch(k, Q, PQi, U,S, Dqi)6: for P in S do7: if LBD(Q, P ) > Λ then S.remove(P ),

U .enqueue(P )8: else if Q Covers P then9: PD = GetTrajectoryData(P )

10: if D(Q, PD) < Λ then Λ =D(Q, PD), PB = P

11: else S.remove(P ), U .enqueue(P )12: for P in S do13: if LBD(Q, P ) < Λ then stop = false14: elseS.remove(P ), U .enqueue(P )15: if stop then break16: k+ = 117: end while18: Return PB

interval ∆t (e.g., the 2-dimensional line AB shown inFigure 4) and, thus, the search can be performed as atraditional NN search with an MBR query predicate.

Given the secondary index assumption, the cost ofaccessing the raw data is equivalent to one random diskaccess per trajectory. ∗ In that case it is reasonableto postpone the data reads as much as possible andutilize lower-bounds instead, such that a smaller can-didate trajectory set can be populated first. That way,we reduce the total number of accesses to the raw datastorage, which means that query performance will de-pend mostly on the index access cost (depending on thesize of the final candidate set). We call this the Lazystrategy. On the other hand, an alternative strategyis to eagerly load the raw trajectory information whenevaluating each individual predicate, and directly com-pute the actual distance of each trajectory from theSTP query. We can use the actual distances as tighterpruning thresholds that will help prune other trajecto-ries using lower-bounds, very efficiently. We call thisthe Eager strategy. The best strategy to use dependson the characteristics of the dataset (how many pagesare occupied per trajectory) and the query properties.

The lazy algorithm is shown in Algorithm 1. † Allreported MBRs are probed into structure S after be-ing removed from the priority queues during the NNsearches. The structure is responsible for keeping up-dated the current lower-bounds of each trajectory andthe query predicate coverage information. For that

∗We assume that individual trajectories are stored sequen-tially on disk if they occupy more than one page.

†The algorithm for the eager strategy needs only simple mod-ifications and is thus omitted.

880

Algorithm 2 Concurrent Best First SearchInput: k, Q, PQn, U,S, Dq1,...,qn

1: for i = 1 to n do2: while PQi not empty do3: Entry N = PQi.top4: if N is a node then5: for j such that PQj .contains(N) do6: PQj .remove(N)7: for C in N do8: if C not in U and C covers qj then9: PQj .enqueue(C, D(Q, C))

10: else if N is a data entry then11: S.insert(N)12: Update Dqi

13: if k-NN is discovered, continue from 114: end for

purpose a hash table indexed by trajectory identifiersis constructed. The hash table contains one entry pertrajectory. Each entry stores the current lower-bounddistance and a bit vector, one bit per query predi-cate, indicating which predicates have not been cov-ered yet by the trajectory MBRs. Every time a newMBR is discovered, the corresponding trajectory entryis retrieved, the appropriate bits are set and the lower-bound distance is updated. Examining if a trajectorycovers the query is a binary AND operation on the bitvector. For trajectories that cover the query, the ac-tual distance D(Q, P ) is computed and stored as thelower-bound. For other trajectories, the appropriateD(qi, P ) values are used.

The function ConcurrentBestFirstSearch (Algo-rithm 2) utilizes the best first search nearest neighboralgorithm to find the k-NN of every query point. Thesearch is incremental so that the priority queues canbe preserved and reused between subsequent execu-tions. Array D(qi) is maintained by storing for eachquery point the distance from the last entry removedfrom the top of the corresponding priority queue. Bothfunctions can be straightforwardly modified to sup-port time-interval predicates as well as top-k searches,where more than one trajectories are retrieved. Forcompleteness, we use the extended algorithm for ourexperimental evaluation, but present the simpler ver-sions here for ease of exposition.

Since the algorithm runs a number of concurrentNN searches, it is unavoidable that some tree nodeswill be retrieved more than once for a number of dif-ferent predicates, even though each best first searchwill access the minimum required number of nodes in-dividually [10]. In cases where there are no memoryconstraints LRU buffering can be used. Since manyNN searches will have similar priority queues that sharemany common entries, especially for query predicatesthat lie close in space and time, locality of reference en-sures that LRU buffering techniques will have a highhit ratio after the buffer becomes hot. Alternatively,the NN searches can be performed in a round robin

fashion, and every time a node is retrieved from thetop of one priority queue, the rest of the queues arescanned and if the node is already contained in any ofthem, then it is directly replaced with its children. Itcan be shown easily that due to the sequence of inser-tions in the queues, this strategy will guarantee thatevery node that is ever accessed by any predicate, willneed to be accessed only once for all queues.

3.1.3 Combinations

For STP Queries With Time that contain combina-tions of range and NN searches, the evaluation orderof the predicates is restricted, by construction, sincethe satisfiability of the NN predicates depends on thesatisfiability of all the range searches. There are two al-ternatives: (1) Evaluate some or all of the range pred-icates first and then proceed with nearest neighborsevaluation and (2) evaluate all nearest neighbors first,by taking into account the satisfiability of the rangesearches whenever a candidate for updating thresholdΛ is retrieved. If the selectivity of any range predicateis known to be very small then, after evaluating thispredicate, all qualifying trajectories can be loaded fromstorage and the rest of the queries can be processed inmain memory. Otherwise, all range predicates can beevaluated in advance and the disqualified trajectoriescan be pinned in set U .

3.2 STP Queries With Order

Assume now that the user is not interested in the ex-act times that the spatial predicates were satisfied, butonly in the order in which they did. Hence these STPqueries can be expressed as ordered sequences of spa-tial predicates of the form Q = {Q1, Q2, . . . , Qn}. LetSQm(P, tk) denote the fact that trajectory P , duringits lifetime, satisfied predicate Qm at time-instant tk.A single trajectory may satisfy any given predicate formultiple time-instants. Formally, we say that trajec-tory P satisfies query Q if there exists an n-length se-quence SQ1(P, t1), . . . , SQn(P, tn) such that ti ≤ tj , forall 1 ≤ i < j ≤ n. Note that the time-interval betweenany two consecutive ti, tj can be arbitrary.

A solution that utilizes existing spatio-temporal in-dex structures (as in the previous section) will certainlybe inefficient for STP queries with order, since no tem-poral constraints are specified. In order to evaluatea single predicate, the complete evolution of the in-dex on the temporal dimension will have to be exam-ined. That is, a range query will need to encompassthe whole lifespan of the dataset on its temporal di-mension (which for large trajectory archives becomesprohibitively expensive). Instead, we propose efficientsolutions to STP queries With Order by using special-ized index structures.

3.2.1 Range Predicate Evaluation

Instead of using a (3-dimensional) spatio-temporal in-dex one could project out the temporal dimension from

881

each trajectory and index the resulting objects with a2-dimensional spatial index. A spatial range query canthen easily retrieve all trajectories that satisfy it, irre-spective of the time that they did so. Intuitively, tofind if a trajectory satisfies the STP query, all rangepredicates would be evaluated first, the intersection ofthe individual result sets would be computed, and theremaining trajectories would be retrieved in order tocheck if they really do satisfy the predicates in the cor-rect order.

There are however clear disadvantages with this so-lution. First, this structure does not inherently pre-serve order. Moreover, the quality of the 2-dimensionalindex is expected to be worse than the 3-dimensionalspatio-temporal one, due to increased overlapping ofprojected MBRs.

Ideally, we would like to have an index structurethat maintains the order with which each trajectorysatisfies an arbitrary sequence of range predicates. Oneway to accomplish this would be to create a ‘pred-icate index’, that is, an index on the range predicatethemselves, instead of the object trajectories. For eachrange predicate rm, the structure associates an orderedlist of Srm(Pl, tk) entries that contains all trajectoriesthat satisfy the predicate. Entries in such list are or-dered by trajectory identifier (Pl). Since a trajectorycan satisfy predicate rm many times throughout itsduration, the entries of the same trajectory are fur-ther ordered by time (tk). Then, given an STP queryQ = {R(r1), R(r2), . . . , R(rn)}, all range predicatescan be evaluated concurrently using an operation simi-lar to a “merge-join” among the n lists associated withthese predicates. Using this structure the correct an-swers are retrieved in sorted trajectory identifier order.

There are two cases where entries from the lists canbe skipped (thus resulting in faster processing of themerge-join). First, whenever in a given predicate listrm a trajectory identifier (say Ps) is encountered thatis larger than all the trajectory identifiers currently atthe top of the other lists, entries from these lists cor-responding to trajectories Pr (r < s) can be effectivelyskipped. Essentially, predicate rm cannot be satisfiedby any of the trajectories with smaller identifiers (sim-ply because such identifiers did not appear in the listof rm). Second, the time-instant at which a trajec-tory satisfied a corresponding predicate, would assistin skipping list entries whenever the ordering of thequery predicates was not obeyed.

Clearly not all possible (ad hoc) spatial range pred-icates can be indexed. Instead, a space partitioninggrid can be used such that an arbitrary range querycan be represented with reasonable accuracy as a listof cells. Predicate list are created for each cell, storingthe trajectories that intersect with the cell.

For ease of exposition assume that a regular grid isused to partition the space. Each cell is representedwith a unique cell identifier. An illustrative example

Algorithm 3 Range STP queries With OrderInput: Query Q = {R(r1), . . . , R(rn)}Output: Trajectories satisfying the predicates1: for i = 1 to n do2: Li = combined list of cells intersected by ri

3: Candidate set U ← ∅4: for i = 1 to n do5: for j = 1 to n do6: if L1[i].id 6∈ Lj then break7: else8: Let k be the first entry for L1[i].id in Lj

9: while L1[i].id = Lj [k].idV

L1[i].t > Lj [k].tdo k+ = 1

10: if L1[i].id 6= Lj [k].id then break11: U = U ∪ L1[i].id12: end for13: Retrieve trajectories in U and verify the results

is shown in Figure 5. Trajectory P3 crossed cell B attime-instant 3. At time-instant 4 trajectory P1 enteredthe cell. Trajectory P2 followed at time-instant 9 andreturned at time-instant 13. For simplicity, in this ex-ample we assumed that each trajectory remained inthe cell for a single time-instant. Then the list for cellB becomes {(P1, 4), (P2, 9), (P2, 13), (P3, 3)}.

B CA

,5),( P2 ,8)

P1 ,4),(P2 ,9),(P2 ,13),(P3 ,3)B:(

C:( P2 ,4),(P ,10),(P32 ,2)

P2D

3

P1

A:(P1,3),(P1

12

...

...

P

Figure 5: A uniform partitioning and the representationof cells as lists of sorted trajectory identifiers.

The proposed approach is shown in Algorithm 3. Ifa range predicate is contained within a cell, we over-estimate the result by using the cell’s list instead. Ifa range predicate is larger than the cell size, we ap-proximate it by the smallest enclosing range of cells.In order to use the above join algorithm a sorted listneeds to be materialized first for the combined cells,which requires time linear to the total number of en-tries in the lists since they are already sorted. Withthis approach a verification step is needed to removefalse positives that are not actually contained in theoriginal range predicate. A fine grid granularity willlead to a small number of such false positives. Usinga grid resolves the overlapping issues of the traditionalMBR indices while it also prunes many object trajec-tories from the search by preserving temporal order-ing. Moreover, the elimination of trajectories that donot satisfy some of the predicates can happen withouthaving to access the raw data from storage, contraryto the available alternatives for spatio-temporal indexstructures. Furthermore, it achieves substantial spacesavings since it maintains small representations of tra-jectories, approximated within the grid granularity. At

882

the same time, it discretizes the space, keeping only afixed number of lists.

Even though in the preceding analysis we use uni-form grids for simplicity, in practice, we do not needto maintain a simple regular grid over the data. Alter-natively, we may use any dynamic space partitioningstructure like the adaptive grid files, kdb-trees, etc. [6].This would guarantee that all the cells of the structurecontain approximately the same number of data, andthe corresponding lists have similar sizes.

3.2.2 Nearest Neighbor Predicate Evaluation

We now consider STP queries With Order that con-tain only nearest neighbor predicates. An object tra-jectory satisfies the query if it minimizes the sum ofthe distances from the query predicates and in the cor-rect order. One straightforward approach is to use theincremental ranking nearest neighbor algorithm intro-duced for STP queries With Time with two neededmodifications: (1) The pruning threshold Λ needs tobe updated only if a candidate trajectory truly min-imizes the distance in the correct order and not ar-bitrarily; and (2) the best first search queues shouldbe populated even with entries that do not containthe predicates in the temporal dimension. However,this solution is expected to yield poor query perfor-mance since it does not inherently prune using predi-cate ordering but needs to postpone ordering verifica-tions until a trajectory is loaded from storage. Also,the increased number of nodes that will be insertedin the queues will slow down execution, intensify thesearch cost, and increase the memory requirements.Moreover, similar with the case of range predicates, anapproach that first uses a 2-dimensional projection toeliminate the temporal dimension will not improve per-formance. Finally, all discovered MBRs during evalu-ation need to be retained, since they might be part ofthe best overall answer given the ordering of the pred-icates, even if they lie farther than other MBRs froma specific query predicate. This means that trajectorylower-bounds cannot be incrementally improved duringevaluation, thus, no trajectories can be pruned unlessa truly small Λ is computed. Hence, this technique willneed to load an excessive number of MBRs in order toprune all candidates and terminate the search. This iscorroborated by our experimental evaluation.

On the other hand, the space partitioning indexstructure proposed in the previous section can be usedto speed up the execution of these queries as well. Oncemore, we utilize the incremental ranking algorithm de-scribed above. However, instead of using the best firstsearch strategy per query predicate, the algorithm iter-atively en-queues and examines all the cells adjacent tothe cell containing the query. For every query predicateqm a sorted queue of satisfiability predicates Sqm(P, ti)is maintained. This queue is populated with all theentries contained in adjacent cells. In each phase, theprocess increases the number of adjacent cells exam-

ined by moving one cell further away from the queryin every direction (in the beginning it examines thecell containing the query, then the eight cells adja-cent to the query, and so on). For all new entriesSqm

(P, ti) that are added in the queue at each step,the lower-bound distance LBD(qi, P ) is computed pes-simistically; i.e., assuming that the actual trajectorywould be as close as possible to the query.

Then, all the predicates are evaluated in order. Forevery new cell added in any query predicate queue, thealgorithm joins it with the queues of all other pred-icates. Trajectories that visit the predicates in thecorrect order are loaded from storage as soon as theyhave appeared in all queues, and the exact distancesare computed. The pruning threshold Λ is updatedaccordingly. The exact distance computation is post-poned for trajectories that do not follow the right or-der. Next, the lower bound distance from the queryneeds to be computed for all candidate trajectories thatappear in at least one of the queues. Assume that atrajectory entry Sqm

(P, t) exists in the queue for qm.The minimum partial distance of P from qm is equalto MinDist(qm, C), where C is the cell that containsP at time-instant t. Assume that no entries for tra-jectory P are contained in the queue of predicate qm.Then, P has not been discovered yet for qm, thus ithas to lie at least as far as the farthest explored pointfor qm in every direction. This distance is equal tothe minimum of the distances of qm from the sides ofthe rectangle defined by the perimeter of the exploredcells. Given the total lower bounding distance of eachtrajectory from the query and Λ, trajectories can besafely pruned in every step.

X

H

B

O R

11

C

...

...

N65

3

P2

1P

12 P1

2

Q

Q

Figure 6: An example of the incremental nearest neighboralgorithm.

The actual algorithm is omitted since the modifi-cations can be straightforwardly applied in Algorithm1. The en-queue, join and merge procedure is shownin Algorithm 4. We further illustrate the details ofthe algorithm using an example. Figure 6 depicts anSTP query with two NN predicates q1, q2 in that or-der, a 6× 4 grid, and three trajectories. The grid cellshave identifiers A,B, . . . ,X, Y starting from the lowerleft corner (not all cell identifiers are depicted). Inthe first phase of processing, the combined list of q1

consists only of the entries in cell H, which containselement Sq1(P2, 8). Similarly, the list of q2 containsentry Sq2(P3, 7). The partial distance of P2 from q1

is set to 0, while the pessimistic distance from q2 isequal to the minimum distance of q2 from the sides

883

Algorithm 4 En-queue, Join and Merge OperationInput: l, Q, CLn,S, Dq1,...,qn

1: for i = 1 to n do2: Locate next level of cells l3: for each new cell list L do4: for T ∈ L do5: S.insert(T )6: Join with lists CLj , j 6= i to see if T sat-

isfies the order (similarly to the rangequery algorithm), and if yes tag it in Sfor subsequent retrieval

7: Insert T in list CLi

8: Update Dqi

9: end for

of cell R. The total lower bound LBD(Q, P2) fromthe query is thus equal to 0 + MinDist(q2, R) (simi-larly LBD(Q, P3) = 0 + MinDist(q1,H)). In the nextphase the algorithm evaluates all cells adjacent to q1

and q2. Starting with q1 the process merges the listsof cells A,B, . . . , N,O (the shaded region around q1),with the list of q1 (cell H). Before the merging opera-tion, each cell is joined with q2, and entries that followthe query ordering are loaded from storage for verifica-tion. For example, consider cell B being merged withquery list q1. The list of B contains element SB(P2, 9).First, we join B with q2 and deduce that there is noentry in q2 with the same identifier yet, so no actionneeds to be taken. Likewise, when merging cell N ,element SN (P3, 2) is probed into list q2, where it isalready contained with a satisfiability predicate cor-responding to time-instant 7. Thus, P3 satisfies theorder and, in addition, it has appeared in the lists ofall predicates, so it is loaded from storage for an exactdistance computation and Λ is updated accordingly.

Trajectories for which the computed lower boundsare larger than Λ can be safely pruned (none in thiscase yet). The rest of the cells are merged in thesame way. The list for q1 becomes (P1, 3− 8), (P2, 6−9), (P3, 2−4). In the same phase, when merging cell Xwith query q2, we join element SX(P2, 1) with the list ofq1, where four entries for P2 are already present. Sinceall entries are associated with time-instants larger than1, we deduce that no sequence of satisfaction predicatesfollows the order of the query, so no action needs to betaken. After all cells around q2 have been merged, thenew partial distances of all trajectories can be com-puted. For trajectory P3 the exact distance is known.For the rest of the trajectories, since they did not sat-isfy the order, the partial distance is equal to the min-imum distance of the query point from the sides of therectangle corresponding to the perimeter of the newlyadded cells (the far side of cell K in this case as shownwith the arrow). The intuition is that since the pointsfound so far do not satisfy the ordering yet, any pointthat does must be at least as far as this distance fromthe query After computing the new lower bounds it canbe deduced that trajectories P1 and P2 can be pruned

given the current Λ, and the algorithm stops.Another interesting problem is computing the min-

imum ordered distance of a trajectory from a givenquery efficiently. Assuming discrete time or approxi-mate trajectory representations, one way is to exhaus-tively check all possible sequences of locations. A bet-ter way is to use the Threshold Algorithm (TA) [4].Given a query Q and a trajectory P , the n-length se-quence of locations that minimizes the distance fromall predicates and also satisfies the query order canbe found by using the incremental top-k ranking con-cept of TA. Our algorithm will perform even better,since whenever the actual query to trajectory distanceneeds to be evaluated, the following information is al-ready computed: (1) That the trajectory definitely sat-isfies the predicate order, and (2) that all the trajectorypoints that are candidates for minimizing the total dis-tance are already available. Given the above and theexpensive nature of this distance function, it is clearthat our technique will offer substantial improvementover the exhaustive search or any other technique thatneeds to evaluate this distance from scratch.

3.2.3 Combinations

Using our proposed grid structure combinations ofrange and nearest neighbor STP queries can be an-swered as well. The same approach as with STPqueries With Time can be utilized. The incremental al-gorithm can be modified to take into account the rangepredicates during the joining phase of the lists. Morespecifically, in the beginning, for every range predicatewe create a combined list of the entries of the con-stituent cells which remain fixed throughout execution.Then, while evaluating the NN predicates, every timea new cell is added in the list of a specific predicate,we join the sorted list of the cell with all other listsof nearest neighbor and range predicates and instantlyprune the entries that do not satisfy any of the rangepredicates with the correct order. The added benefit ofthe algorithm is that, depending on the selectivity ofthe range queries, it is expected that the candidate en-tries for inclusion in the lists will decrease substantiallyduring evaluation.

4 Experimental Evaluation

We proceed with a comprehensive experimental eval-uation of the proposed algorithms. We run variousexperiments with synthetic datasets to test the behav-ior of each technique under different settings. All ex-periments were run on an Intel Pentium-4 2.4 GHzprocessor running Linux, with 512 MBytes main mem-ory. We generated synthetic datasets of moving objecttrajectories. The dataset represents the freeway net-work of Indiana and Illinois. The 2-dimensional spatialuniverse is 1,000 miles long in each direction and con-tains up to 500,000 objects. Object velocities followa Normal distribution with mean 60 mph, and stan-dard deviation 15 mph. We run simulations for 400

884

minutes (time-instants). To illustrate the generality ofour framework, we index the data using two differentindexing techniques, the R-tree and the MVR-tree.

4.1 STP Queries With Time

For this type of queries trajectories were split into mul-tiple MBRs and then indexed using an R-tree and anMVR-tree. For our experiments we approximated thedatasets using on average 20 MBRs per trajectory (fora total of approximately 6,000,000 MBRs) [9]. We setthe page size for all indices to 4 KBytes, and used a256 pages LRU buffer (e.g., in comparison to a totalof 61,477 pages present in an R-tree indexing 300,000trajectories). For clarity we present the results only forthe R-tree in the graphs, but very similar conclusionswere drawn for the MVR-tree as well.

We use four query sets: RandomPattern, Rel-evantPattern, CombinedPattern and Interval-Pattern. All query sets contain sets of NN predicatesonly, except from the CombinedPattern that con-tains combinations of NN and range predicates. Allsets have 100 queries and each query consists of anumber of predicates at increasing time instants. Forall queries we provide the top-20 results. Queries inthe RandomPattern set have predicates that lie onconsecutive nodes of the network, reachable from eachother given the maximum velocity of objects and thetemporal constraints of the query. The Relevant-Pattern set contains queries that are formed frompartial segments of object trajectories already con-tained in the dataset, slightly skewed in time and spacefrom the original, with random location/time-instanttuples dropped in-between. With the CombinedPat-tern set, we evaluate the algorithms using query setsthat are a mix of NN and range predicates. In partic-ular, we randomly replace a number of NN predicateswith range searches centered at the same positions.Finally, the IntervalPattern set is generated simi-larly to the RelevantPattern, and includes only NNpredicates but with time-interval temporal constraints.

Comparison vs. Linear Scan. Since our querysets contain NN predicates, we can only compareagainst a linear trajectory scan (since the traditionalsearch algorithms for spatial index structures cannotbe applied). We compared the linear scan against ourtechniques for all subsequent experiments. Although,since the total I/O of this straightforward approachwas, on average, three orders of magnitude higher thanthe other algorithms, we exclude it from the graphs inorder to preserve detail. Hence in the rest we concen-trate on the relative performance of the lazy and eageralgorithms.

Performance vs. Number of Predicates. Wetest our algorithms for queries with increasing numberof predicates. The results are shown in Figure 7 for theRandomPatterns and Figure 8 for the Relevant-Patterns, where we plot the index I/O, the trajectoryI/O and the total I/O for each technique. Clearly,

for RandomPatterns and numerous predicates thelazy algorithm deteriorates substantially. Many indexnodes need to be accessed before a tight threshold canbe computed, hence, the NN searches become very ex-pensive. On the contrary, for RelevantPatterns allalgorithms remain practically unaffected, since a near-est neighbor is discovered fast and the searches termi-nate faster.

In terms of average trajectory loads (LR-T and ER-T), for RandomPatterns all algorithms need to loadan increasing number of trajectories, as the numberof predicates increases. In order to find the top-20results the probability that many trajectories satisfythe query decreases proportionately and the searchesstart expanding correspondingly, thus many trajecto-ries are being discovered. The eager algorithm loads alarger number of trajectories in all cases. Assuming onerandom I/O per trajectory access, the eager achievesthe best performance overall, balancing the trajectoryaccesses and the index overhead, especially for largenumbers of predicates, where the searches become ex-pensive. For RelevantPatterns all algorithms loadthe same number of candidate trajectories. Since thereexist many trajectories similar to the given query, thetop-20 candidates are found very fast, and only fewextra trajectories cannot be pruned using the lowerbounds.

0

500

1000

1500

2000

2500

3000

5 10 20

Number of predicates

Avg

. in

dex

I/O

LR-I

ER-I

0

500

1000

1500

2000

2500

3000

3500

5 10 20

Number of predicatesT

ota

l I/O

LR-ILR-TER-IER-T

Figure 7: RandomPattern: Number of predicates.

75

76

77

78

79

80

81

5 10 20Number of predicates

Avg

. in

dex

I/O

LR-I

ER-I

0

50

100

150

200

250

300

350

400

5 10 20

Number of predicates

To

tal I

/O

LR-I LR-TER-I ER-T

Figure 8: RelevantPattern: Number of predicates.

Performance vs. Dataset Size. Next, we runscale-up experiments with increasing numbers of tra-jectories. Figures 9 and 10 summarize the results. Weused both RandomPattern and RelevantPatternquery sets with 10 predicates per query. As expected,we observe that for all index structures the averagenumber of node accesses per query increases, since thetotal number of nodes in the structures increases aswell, and the trees become deeper. The total numberof trajectory accesses remains stable for Relevant-Patterns, since the top-20 results per query are dis-covered very fast, irrespective of the total number of

885

objects in the dataset. On the contrary, for Random-Patterns there is a linear increase in the number oftrajectory access, since with a larger dataset size andlooser thresholds, more trajectories are discovered dur-ing the NN searches. In terms of total I/O the eageralgorithm is once more the best choice overall, provid-ing better results for the RandomPatterns.

0

200

400

600

800

1000

1200

1400

100K 300K 500K

Number of trajectories

Avg

. in

dex

I/O

LR-IER-I

0

500

1000

1500

2000

100K 300K 500K

Number of trajectories

To

tal I

/O

LR-I LR-TER-I ER-T

Figure 9: RandomPattern: Number of trajectories.

0

50

100

150

200

100K 300K 500K

Number of trajectories

Avg

. in

dex

I/O

LR-I

ER-I

050

100150200250300350400450

100K 300K 500K

Number of trajectories

To

tal I

/O

LR-I LR-TER-I ER-T

Figure 10: RelevantPattern: Number of trajectories.

Query Runtime vs. Number of Predicates.Figure 11 reports the average computational cost (asthe total wall-clock time, averaged over a number ofexecutions) for top-20 queries of various numbers ofpredicates. Doubling the number of predicates dou-bles the amount of time required to answer the Ran-domPattern queries, as expected due to lack of goodcandidates. In contrast, the number of predicates doesnot affect RelevantPattern queries, which are eas-ier to answer.

RandomPattern

0

5

10

15

20

25

5 10 20Number of predicates

Avg

. qu

ery

run

tim

e (s

ecs)

LR-IER-I

RelevantPattern

0

0.5

1

1.5

2

2.5

5 10 20Number of predicates

Avg

. qu

ery

run

tim

e (s

ecs)

LR-I ER-I

Figure 11: Query runtime.

Performance vs. Buffer Size. Figure 12 plotsthe effect of the buffer size on query performance. Weuse 10 predicates per query and top-20 nearest neigh-bors. Since we run multiple NN searches concurrentlywe expect some searches to access the same nodes atapproximately the same time. Larger buffer sizes helpalleviate the problem of loading these nodes multipletimes. We can see from the graph that an 1 Mbytebuffer is adequate for eliminating most superfluousnode accesses. As expected, the lazy algorithm ben-efits the most, since it is the one with the heaviestindex access cost.

RandomPattern

0

500

1000

1500

2000

2500

128 512 1024

Buffer size (Kbytes)

Avg

. in

dex

I/O

LR-I

ER-I

RelevantPattern

0

100

200

300

400

500

600

128 512 1024

Buffer size (Kbytes)

Avg

. in

dex

I/O

LR-I

ER-I

Figure 12: Buffer size.

Interval Queries. We performed experiments us-ing the IntervalPattern queries. We used a fixedtime-interval ∆t equal to ten time-instants for all querypredicates. For comparison purposes Figure 13 showsthe results of the eager algorithm for queries with time-instant predicates and interval predicates (IER) (thesame set of queries was used, where each time-intervalconstraint was replaced with a time-instant). As ex-pected, interval queries have the larger index accesscost in comparison with time-instant queries, due tothe larger number of nodes that intersect with the tem-poral constraints of the predicates. A surprising resultis that interval queries load much fewer candidate tra-jectories from storage. Intuitively, the time-intervalalgorithm locates very similar trajectories to the querya lot faster, by performing a more robust searchingon the time dimension, thus a very tight threshold iscomputed earlier during the evaluation process.

0

50

100

150

200

250

300

350

5 10 20

Number of predicates

Avg

. in

dex

I/O

IER-I

ER-I

0

100

200

300

400

500

5 10 20

Number of predicates

To

tal I

/O

IER-I IER-TER-I ER-T

Figure 13: IntervalPattern: Number of predicates.

Combined Queries. Finally, we tested our algo-rithms for CombinedPattern queries that containboth NN and range predicates. Figure 14 shows theeffect of increasing the range query selectivity to theindex and trajectory access cost, when using top-20queries. Line ER-I corresponds to the cost of answer-ing a query that contains only NN predicates. RER-1and RER-2 correspond to answering the same query,where one NN predicate has been replaced with a rangepredicate. Clearly, for very small range queries that donot contain the actual nearest neighbors of the origi-nal query, the remaining NN searches need to expandthe search substantially in order to retrieve their clos-est candidates that also satisfy the range predicate.Here, the cost for range query side 2.5 has being clippedin order to preserve the detail in the graph. The ac-tual cost is 3,000 index I/Os on average. The problemis exacerbated since a large number of nearest neigh-bors (twenty in this case) is requested. As the rangequery increases in size, since the original twenty near-est neighbor trajectories are all contained in the range,the rest of the searches discover them very fast. In this

886

case, the index cost for RER-1 increases due to the ex-tra cost for answering the range query in the beginning.On the contrary, the trajectory access cost decreasessince many new trajectories discovered during the NNsearches can be pruned, given the answer of the rangequery. Alternative RER-2 gave similar results to theoriginal query, as expected, since the range predicatehas no effect on the execution of the ranking algorithm,pruning entries only after they have been loaded fromstorage.

0

50

100

150

200

2.5 5 10 20Range side length

Avg

. in

dex

I/O

RER-1RER-2ER-I

0

100

200

300

400

500

600

700

2.5 5 10 20

Range side length

To

tal I

/O

RER-1-I RER-1-TRER-2-I RER-2-TER-I ER-T

Figure 14: CombinedPattern: Range selectivity.

Conclusion. In general our algorithms yielded veryefficient pruning for STP queries, especially when con-sidering that the only available alternative is the linearscan. The eager strategy performed the best overall,in all cases, balancing the trajectory access cost andthe index overhead.

4.2 STP Queries With Order

For this type of queries we selected a 100×100 uniformgrid and created the corresponding sorted lists per cell,as described in Section 3.2. We compare our grid basedindex against the straightforward approach of using a2-dimensional R-tree.

4.2.1 Range Queries

For this experiment we created a query set that con-tains only range predicates. The queries were createdsimilarly to the RelevantPatterns set, by replacingthe NN predicates with range predicates. For the R-tree index, assuming that the selectivity of the pred-icates is not known, all predicates were evaluated insequence and the intersection of the individual resultsets was returned.

Performance vs. Number of Predicates. First,we evaluate the techniques using varying numbersof range predicates, given a fixed range query sidelength equal to 15 miles (with the universe size being1, 000× 1, 000 miles). The results are shown in Figure15. The CellList in the graph corresponds to our pro-posed technique, and the R-tree to the 2-dimensionalR-tree approach. The CellList has significantly re-duced index I/O, due to more efficient pruning duringthe list join step. Moreover, another benefit of our ap-proach is the reduced number of raw trajectory datathat need to be retrieved. Our approach yields fromtwo up to three orders of magnitude less trajectory ac-cesses than the R-tree (the graph shows the results ina logarithmic scale). This is expected, since the R-treehas no way of pruning trajectories due to predicate or-

dering constraints; it does not store such informationin the index.

0

200

400

600

800

1000

5 10 20Number of predicates

Avg

. in

dex

I/O

CellListR-tree

1

10

100

1000

10000

5 10 20

Number of predicates

To

tal I

/O(l

og

sca

le)

CellListR-tree

Figure 15: RelevantPattern: Number of predicates.

Performance vs. Range Selectivity. The nextexperiment evaluates the performance with respectto increasing range predicate selectivity. Figure 16presents the results. As expected, when the size ofthe range increases the index I/O rises proportionately,since a larger number of candidates are retrieved. Sim-ilar observations hold for trajectory accesses as well.Clearly, the CellList can answer STP queries With Or-der, by far more efficiently than the R-tree, and witha structure that is smaller in size

0

100

200

300

400

500

600

700

800

10 15 20Range side length

Avg

. in

dex

I/O

CellListR-tree

1

10

100

1000

10000

10 15 20

Range side length

To

tal I

/O(l

og

sca

le)

CellListR-tree

Figure 16: RelevantPattern: Range selectivity .

4.2.2 Nearest Neighbor Queries

For these queries we used the RelevantPatterns to eval-uate the CellList both using the lazy and the eagerstrategies. We also run a simple eager algorithm on a2-dimensional R-tree, with the ranking algorithm de-scribed in Section 3.2.2. We compared each techniquefor an increasing number of predicates. The resultsare reported in Figure 17. The lazy strategy workedthe best in this case, requiring a small number of indexaccesses as well as a very small number of trajectory ac-cesses. Naturally, delaying trajectory retrieval in thiscase is beneficial due to the double pruning based bothon lower-bounds and the predicate ordering. It is ev-ident that, contrariwise, the eager strategy has threeorders of magnitude more trajectory accesses, since itunnecessarily retrieves trajectories that do not satisfythe order. The R-tree approach, as predicted in Sec-tion 3.2.2, deteriorated to almost a sequential scan ofboth the index structure and the trajectory storage,especially for larger numbers of predicates.

For STP Queries with Order, the CellList is a robustsolution with excellent performance, both for rangeand nearest neighbor predicates. In particular, the lazystrategy for these queries works the best, in contrastto STP queries With Time, where the eager strategywas the best alternative.

887

1

10

100

1000

10000

100000

5 10 20Number of predicates

Avg

. in

dex

I/O

(lo

g s

cale

)

Lazy Eager R-tree

1

10

100

1000

10000

100000

1000000

5 10 20

Number of predicates

To

tal I

/O(l

og

sca

le)

Lazy Eager R-tree

Figure 17: RelevantPattern: Number of predicates.

5 Related Work

Modelling and expressing complex spatio-temporalqueries has been investigated in [7]. In this paper weuse a very general expression mechanism, that can uti-lize any previous definition of spatio-temporal predi-cates. Applying the incremental ranking algorithms foranswering STP queries with NN predicates has beeninspired by the Threshold Algorithm (TA) proposedin [4]. The TA algorithm is used for ranking objectswith multiple features given any monotonic preferencefunction. Here, we show how this algorithm can be ap-plied in the case of spatio-temporal data using indexstructures instead of materialized sorted lists. Tradi-tional nearest neighbor queries have been addressedin previous work including [1, 5, 10, 20]. None ofthese approaches has focused on efficient evaluation ofcombinations of NN predicates (with the exception ofGNN queries [16]). To the best of our knowledge, nowork has appeared for answering combinations of spa-tial predicates with order and without temporal con-straints.

Related to the STP range queries with order (afterthe grid reduction) is the work in [12] which consid-ers non-contiguous pattern queries on string sequences.However, our structures are more suitable for spatio-temporal data and can answer more general queries.Mouza and Rigaux [3] introduced mobility patternqueries, where patterns are expressed using regular ex-pressions. This work is a special case of STP queries,concentrating on STP queries With Order and Rangepredicates. The techniques introduced therein cannothandle distance based predicates, neither explicit tem-poral constraints. Furthermore, the patterns are lim-ited to predefined ranges, which have been determinedin advance by partitioning the space into areas of in-terest. In contrast, here we present more general tech-niques that can handle arbitrary query ranges, distancebased predicates, and temporal constraints.

Any trajectory indexing scheme that can answernearest neighbor, range queries, and other spatial pred-icates can be used with the STP query algorithm de-scribed here [22, 15, 2, 13, 18, 21]. were there is an in-herent uncertainty in moving object trajectories [23].

6 Conclusions

We introduced and formalized a novel type of spatio-temporal pattern query for trajectories. We designedspecialized spatio-temporal index structures and algo-rithms to effectively reduce the computation and I/O

operations per query, when compared with existing ap-proaches. Finally, we presented a thorough experimen-tal evaluation. For future work we plan to extend thetechniques for answering pattern queries that imposeconstraints between groups of predicates, and for in-cluding relative temporal constraints.

References

[1] R. Benetis, C. Jensen, G. Karciauskas, and S. Saltenis.Nearest neighbor and reverse nearest neighbor queries formoving objects. In IDEAS, pages 44–53, 2002.

[2] V. P. Chakka, A. Everspaugh, and J. M. Patel. Indexinglarge trajectory data sets with seti. In CIDR, 2003.

[3] C. du Mouza and P. Rigaux. Mobility patterns. In STDBM,pages 1–8, 2004.

[4] R. Fagin, A. Lotem, and M. Naor. Optimal aggregationalgorithms for middleware. In PODS, pages 102–113, 2001.

[5] H. Ferhatosmanoglu, I. Stanoi, D. Agrawal, and A. El Ab-badi. Constrained nearest neighbor queries. In SSTD, pages257–278, 2001.

[6] V. Gaede and O. Gunther. Multidimensional access meth-ods. ACM Computing Surveys, 30(2):170–231, 1998.

[7] R. H. Guting, M. H. Bhlen, M. Erwig, C. S. Jensen, N. A.Lorentzos, M. Schneider, and M. Vazirgiannis. A founda-tion for representing and querying moving objects. TODS,25(1):1–42, 2000.

[8] A. Guttman. R-trees: A dynamic index structure for spatialsearching. In SIGMOD, pages 47–57, 1984.

[9] M. Hadjieleftheriou, G. Kollios, V. J. Tsotras, andD. Gunopulos. Efficient indexing of spatiotemporal objects.In EDBT, pages 251–268, 2002.

[10] G. R. Hjaltason and H. Samet. Distance browsing in spatialdatabases. TODS, 24(2):265–318, 1999.

[11] N. Mamoulis, H. Cao, G. Kollios, M. Hadjieleftheriou,Y. Tao, and D. W. Cheung. Mining, indexing, and queryinghistorical spatiotemporal data. In SIGKDD, pages 236–245,2004.

[12] N. Mamoulis and M.L. Yiu. Non-contiguous sequence pat-tern queries. In EDBT, pages 783–800, 2004.

[13] M. F. Mokbel, T. M. Ghanem, and W. G. Aref. Spatio-temporal access methods. IEEE Data Engineering Bulletin,26(2):40–49, 2003.

[14] M. F. Mokbel, X. Xiong, and W. G. Aref. SINA: Scalableincremental processing of continuous queries in spatiotem-poral databases. In SIGMOD, 2004.

[15] M. Nascimento, J. Silva, and Y. Theodoridis. Evaluationof access structures for discretely moving points. Spatio-Temporal Database Management, pages 171–188, 1999.

[16] D. Papadias, Q. Shen, Y. Tao, and K. Mouratidis. Groupnearest neighbor queries. In ICDE, pages 301–312, 2004.

[17] W.-C. Peng and M.-S. Chen. Developing data allocationschemes by incremental mining of user moving patterns ina mobile computing system. TKDE, 15(1):70–85, 2003.

[18] D. Pfoser, C. S. Jensen, and Y. Theodoridis. Novel ap-proaches in query processing for moving object trajectories.In VLDB, pages 395–406, 2000.

[19] K. Porkaew, I. Lazaridis, and S. Mehrotra. Querying mobileobjects in spatio-temporal databases. In SSTD, pages 59–78, 2001.

[20] N. Roussopoulos, S. Kelley, and F. Vincent. Nearest neigh-bor queries. In SIGMOD, pages 71–79, 1995.

[21] Y. Tao and D. Papadias. MV3R-Tree: A spatio-temporalaccess method for timestamp and interval queries. InVLDB, pages 431–440, 2001.

[22] Y. Theodoridis, T. Sellis, A. Papadopoulos, andY. Manolopoulos. Specifications for efficient indexing inspatiotemporal databases. In SSDBM, pages 123–132, 1998.

[23] G. Trajcevski, O. Wolfson, K. Hinrichs, and S. Chamber-lain. Managing uncertainty in moving objects databases.TODS, 29(3):463–507, 2004.

888