query processing in connectivity- challenged environments priyanka puri sharma chakravarthy gururaj...
DESCRIPTION
Query Processing Has been addressed in the context of centralized DBMSs Has been addressed in the context of distributed DBMSs Cost-based plan generation is typically used So, is there anything more/new to do? May 23, 2010Sharma: AF Mobility WorkshopTRANSCRIPT
Query Processing in Connectivity-Challenged
EnvironmentsPriyanka Puri
Sharma ChakravarthyGururaj Poornima
Mohan KumarInformation Technology Laboratory
Computer Science and Engineering Department The University of Texas at Arlington, Arlington, TX 76009
Email: [email protected]: http://itlab.uta.edu/sharma
• This effort is supported by AFRL under Contract Number: FA8750-09-2-0199
• Sanjay Madria and Raytheon (Waseem Naqvi) are also involved in this project
May 23, 2010 Sharma: AF Mobility Workshop
Query Processing
• Has been addressed in the context of centralized DBMSs
• Has been addressed in the context of distributed DBMSs
• Cost-based plan generation is typically used
• So, is there anything more/new to do?
May 23, 2010 Sharma: AF Mobility Workshop
Ground Controller 2
Ground Controller n
Ground Controller 1
UAV 1
UAV 4UAV 3
UAV 5
UAV 2
May 23, 2010 Sharma: AF Mobility Workshop
Ground Controller 2
Ground Controller 1
Ground Controller n
UAV 5
UAV 3
UAV 1
UAV 2
UAV 6
May 23, 2010 Sharma: AF Mobility Workshop
Currently• Data is dumped into a central server and
queried
• Bandwidth, QoS issues are not addressed
• No collaboration among nodes
• No continuous query processing, notification, fusion, context usage, and real- or near real-time support
May 23, 2010 Sharma: AF Mobility Workshop
Network of computing nodes:Unmanned vehicles, Sensors, Robots, PCs ,
Servers, Ground Controlling devices
Fault Tolerance Services
Context/ Knowledge
Base
Local fusion/Materiali
zation
Publish Subscribe Capability
Query Capability Raw Data / fused data
/data from other nodes
Queries, Tasks, Requests, Continuous Queries Publish/Subscribe
SOA Distributed MiddlewareTask planning Join computationComposition pub/subContext-aware NotificationResource Management Data management
Limited ResourcesMobilityHeterogeneityDisconnections
Proposed long-term Architecture
May 23, 2010 Sharma: AF Mobility Workshop
Query Processing
May 23, 2010 Sharma: AF Mobility Workshop
MyObjects Table at each node
Timestamp Node_id Longitude Latitude Obj_type Obj_desc Object_ptr
8 bytes 4 bytes 4 bytes 4 bytes 8 chars Varchar (64)
Pointer (8 bytes)
Total width: 100 bytes
Cardinality (number of tuples) , Selectivity, replication site of data are known (part of meta data)
May 23, 2010 Sharma: AF Mobility Workshop
Query Plan Format
May 23, 2010 Sharma: AF Mobility Workshop
Operation 1 Param Operand1 Operand1 Loc
Operand2 Operand 2 Location
Result Name
Result Loc
Operation 2 Param Operand1 Operand1 Loc
Operand1 Operand2 Loc
Result Name
Result Loc
… … … … … … … …
Operation n Param Operand1 Operand1 Loc
Operand1 Operand2 Loc
Result Name
Result Loc
Operations in Plan formatOperation Param Operand
1Operand
1 LocOperand
2Operand
2 LocResult Name
Result Loc
Select A > 100 R1 1 Null Null R1’ 1
Project A1, A3, A4 R1’ 1 Null Null R1’’ 1
Move Null R1’’ 1 Null Null R’’ 2
Copy Null R1” 1 Null Null R14 4
SemiJoin A = C R” 2 R2 2 SR1 2
Join B = D R12 2 R2’’ 2 JR1 2
May 23, 2010 Sharma: AF Mobility Workshop
Plan using Semijoin chainsSELECT c1 R1
MOVE R11 To Site2
SELECT c2 R2
SJ R11 R21 : J1
MOVE J1 To Site3
SELECT c3 R3
SJ J1 R31 : J2
MOVE J2 To Site2
SJ J2 R21 : J3
MOVE J3 To Site1
SJ J3 R11 : J4
COPY R To Site7 :JTotal Cost= 14720 + 32000 = 46720
May 23, 2010 Sharma: AF Mobility Workshop
1 2 3
[lat][long]
R1 [1000] R2 [5000] R3 [3000]
R11[800]R21[3000]
R31[600]
selectproject select
projectselectproject
Cost=3200 Cost=4800
Cost=1920
Cost=4800
7
JCost=32000
J1[1200]
J3[1200]
J2[240]
[lat,nodeid]
[long,nodeid
]
J4[320]
Semi-join/join plan generation
• We are developing algorithms for generating the plan space and pruning it for generating “best” (or “good”) plan for each input query (expressed as a join query)
• It is a cost-based algorithm based on System R and SDD approaches extended to include connectivity and bandwidth issues
• The complexity of plan generation is kn ; n is number of joins and k is the number of alternatives for each join.
• Assuming less than 5 joins in a query• Integrate replication into the algorithm
May 23, 2010 Sharma: AF Mobility Workshop
Plan Generation Alternatives• A Query Plan (QP) is a numbered sequence of operations
for executing a Query• A QP includes how data is moved as part of execution
• Plan generation alternatives Static Plan: generated once and executed in a distributed
manner Dynamic plan: generated incrementally at each node as the
query progresses using current connectivity information Parallel plan: partial plans are executed in parallel Interactive plan: get some estimate by asking nodes that has
relevant data
May 23, 2010 Sharma: AF Mobility Workshop
Static plan
• The physical plan generated will have node information for data propagation.
• This will be mapped to “actual connectivity” by the physical layer for execution
• It is possible that no connectivity exists by the time execution is performed for a generated query plan
• In that case, either a new plan can be generated (using the same algorithm, but using current meta data) or an alternative approach can be used to incrementally modify the plan
May 23, 2010 Sharma: AF Mobility Workshop
Dynamic plan• Generate plan for the first join and defer the rest of
the plan Join plans are generated one at a time Current connectivity information can be used Result size estimation will also be more accurate
• Query execution and (partial) plan generation are intertwined
• Does not increase the complexity of plan generation or plan execution (compared to static)
May 23, 2010 Sharma: AF Mobility Workshop
Parallel plan
• All local operations/computations (select, project, and even some joins) can be done in parallel Join plans are still generated one at a time Increases message/information exchange Current connectivity information can be used Result size estimation will also be more accurate
• Deal with responses and plan generation and execution may be slightly more complicated than the previous cases
May 23, 2010 Sharma: AF Mobility Workshop
Interactive plan• When a query comes in, send out requests for local
processing and get processing time and size information
• Use the above to generate partial plans Join plans are still generated using information
obtained interactively Increases message/information exchange Current connectivity information can be used Result size estimation will also be more accurate
• Combines Dynamic and parallel execution in an interactive manner
May 23, 2010 Sharma: AF Mobility Workshop
Replication Issues• Algorithm for Replication
Single copy replication that “minimizes” the data transmission cost and “maximizes” the number of paths (to deal with connectivity)
• Algorithm for Replication utilization Given a replication, determine the utility of that
replica in terms of query evaluation cost for a reasonable load
• Reconcile the above two to come up with a replication strategy that balances the competing tradeoffs
May 23, 2010 Sharma: AF Mobility Workshop
Thank You !
Sharma: AF Mobility Workshop
May 23, 2010