1 sina: scalable incremental processing of continuous queries in spatio-temporal databases mohamed...
Post on 20-Dec-2015
222 views
TRANSCRIPT
1
SINA: Scalable Incremental Processing of Continuous Queries
in Spatio-temporal Databases
Mohamed F. Mokbel, Xiaopeng Xiong, Walid G. Aref
Presented by
Nilu Thakur
Prasad Sriram
SIGMOD 2004 June 13-18, Paris, France.
2
Outline
Introduction Problem definition Contributions Key Concepts
Shared Execution Hashing Invalidation Joining
Validations Assumptions Rewrite Today
3
Introduction (1/3)
Moving query on stationary objects
Find the nearest gas station(s) within 1 miles of moving red car
4
Introduction (2/3)
Another Example….
Moving query on moving objects
Continuously find all police cars within 3 miles of the moving red car
5
Introduction (3/3)
Another Example….
Stationary query on moving objects
Continuously find all vehicles within 1 miles of my houseMy
House
6
Problem Definition
Input Given a large number of mobile/stationary objects and
continuous spatio-temporal queries
Output Produce fast, complete and correct results
Objective Continuous evaluation Scalability in terms of number of queries Report only updates to previous answer
Constraints Any delay in query response might result in outdated
answer Limited Network bandwidth
7
Contributions
Shared execution paradigm Groups similar queries in a query table Spatial join between moving queries and moving
objects Differ from previous approaches of using R-tree and Q
index structure for moving query on moving object (Instead uses spatial join assuming no indexing structure)
Incremental evaluation (Most Significant) Maintains an in-memory table to store positive and
negative updates Negative updates may cancel previous positive update
& vice versa Sends a set of updates to queries every ‘T’ time
9
Shared Spatio-temporal Join
. .
Q1
+/-
Split
. .
Q2
+/-
. .
QN
+/-
Stream of Moving Objects
Stream of Moving Queries
Shared Operator
Shared Memory Buffer among all C. Queries
Stationary Range
+/-
. .
.
+/- +/-
. .
. . .
. . .
. Q1 Q2 QN
Stream of Moving Objects
Moving kNN
Moving Range
Shared Execution
Slides Courtesy [Mokbel et al]
10
Shared Execution Spatial join can use R-tree index for stationary
objects Q-index can be used for stationary queries No index structure when both query and object are
moving
Incremental Evaluation: Hashing Invalidation Joining
Key concepts: (continued)
11
State diagram of SINA
In-Memory Hashing
Stream of moving objects & moving queries
DISKMemory Full or Timeout
Incremental Result
Invalidation
Memory Full or Timeout
Memory-disk Join
Send Incremental results to queries
Negative & positive update
Q1 Q2………Qn-1 Qn
Done
Negative update
Positive update
HASHING INVALIDATION JOINING
12
Key concept: An example to understand
Q1-Q5 represents 5 continuous Range QueriesP1-p9 represents objects, White circle: Moving objects (p1,p2,p3,p4)black circle: Non-moving objectsdashed line represents moving queries(q1, q3, q5)
13
Key concept: Step I-Hashing
Two in-memory hash table with N buckets for storing moving objects & moving queries
One in-memory query table to keep track of upper left and lower right corners of query region
Hashing --> probing -->storing --> (q3,+p2) reported
14
Key Concept: Step II-Invalidation
Map objects and queries to one or more disk-based N*N grid cells
Flush out the buckets containing moved objects and queries
If object maps to same grid then the object has not moved
Else
Add the object entry in this grid cell
Look for queries that contain this object. Remove these objects from the queries by sending negative updates.
Repeat the same procedure for invalidating queries.
Query entry
Object entry
15
Key concept: Step III- Joining
No additional data structure Two spatial join operations for each grid cell
Join in-memory objects with in-disk queries Join in-memory moving queries with in-disk
objects Send updated answers to clients Clear all memory data structures
16
Performance Analysis (1)
Answer size
Impact of Grid Size N
17
Performance Analysis (2)
Scalability with number of objects
Scalability with number of queries
18
Performance Analysis (3)
% of moving objects
Scalability with update rates
19
Extensibility
Querying the future K-Nearest Neighbor queries Aggregate queries Out-of-Sync clients
20
Assumptions No computational capabilities on the
Client side
No Storage capabilities on the client side
Both the assumptions are fair considering that many times client uses cheap, low battery and passive devices that do not have computational or storage capabilities.
No velocity Assumptions.
Optimal time interval for sending updates to queries set to 10 seconds.
21
Validations
Methodology Experiments performed on synthetic data Used Network-Based Generator of Moving Objects Input to generator is road map of city of Oldenburg, Germany Theorem-Proving
Validation criteria Comparison with other non-incremental algorithms based on
Size of the results Impact of grid size Scalability with number of objects Performance in terms of CPU and I/O time
Advantages Very much appropriate to check correctness & efficiency of proposed
algorithm where rich datasets with various problem features are not available.
Disadvantages Real world conditions might differ from experimental results
22
Rewrite today Assumptions:
No unreasonable assumptions made. In fact, removes some previous assumptions made by other related techniques
Preservations Incremental way of sending updates Shared execution Not having assumptions about computational capabilities of
client Improvements
Incorporate some techniques to determine the optimal ‘T’ i.e., time between sending updates
Through experiments Learning based on the past statistics about how valid the
previous updates were Extend to handle queries involving huge object
histories