workshop on networking meets databases (netdb’07) throughput-optimized, global-scale join...
TRANSCRIPT
Workshop on Networking Meets Databases (NetDB’07)
Throughput-Optimized, Global-Scale Join Processing in Scientific Federations
Xiaodan Wang, Randal Burns, and Andreas Terzis
The Johns Hopkins University
Wang, Burns, Terzis. Throughput-Optimized, Global-Scale Join Processing…
Data volume and geography deter scalability Performance is network bound
– Intermediate results are often hundreds of megabytes 30 sites across North America, Europe, Asia
– Community has identified 100 sites to be included
Wang, Burns, Terzis. Throughput-Optimized, Global-Scale Join Processing…
Join Processing in Heterogeneous Networks
Query plans optimized for scalability– Without latency/response time constraints– On global-scale, heterogeneous networks– For applications that transfer hundreds of MBs among continents
Balanced utilization of all network paths– A new query optimization goal (metric)– Exploit excess capacity where available– Avoid narrow, long-haul paths when possible
Join processing techniques and algorithms– Identifying network structure: clusters of sites and path throughput– Optimize for non-uniform and non-metric networks– Balance network usage and computation
Wang, Burns, Terzis. Throughput-Optimized, Global-Scale Join Processing…
Why do we need a new metric?
Minimizing response time– Consumes all available resources to achieve the goal
Minimizing computation costs– Does not address network bound applications
Minimizing the volume of network traffic– Insensitive to network heterogeneity
And we are concerned with– Polynomial-time algorithms for large-scale federations– Avoiding multi-objective optimization
Wang, Burns, Terzis. Throughput-Optimized, Global-Scale Join Processing…
count *
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Occasionally transfers data across the Atlantic multiple times
SkyQuery’s computation oriented optimization– Schedule sites in order of increasing cardinality
Minimizes computation costs under several assumptions– Perfect join selectivity (holds in practice)– Computation costs linear in the size of intermediate results
(because it’s an index join)
Wang, Burns, Terzis. Throughput-Optimized, Global-Scale Join Processing…
Balanced Network Utilization Cost of using a path is product of the volume of data transmitted
and the inverse TCP throughput Cost of a schedule is the sum over all paths
Takes advantage of path heterogeneity– By using higher-throughput paths proportionally more
Reduces contention on narrow, long-haul paths– By making them costly
But, its not a direct measure of scalability– Does not load balance paths over multiple queries
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Wang, Burns, Terzis. Throughput-Optimized, Global-Scale Join Processing…
Path Throughput Measure throughput among all federation sites pairwise
– Using a nearby PlanetLab proxy site for each SkyQuery site– 3 times a day, bulk TCP transfer
TCP throughput reflects geography– Dominant 1/distance trend correlates well with 1/RTT– But, highly non-metric
Input to scheduling
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Wang, Burns, Terzis. Throughput-Optimized, Global-Scale Join Processing…
Throughput Stability Should we measure throughput more often?
– Accurate measurements are intrusive (bulk-transfer)– Short duration measures are error prone (cross-traffic)
The most volatile paths are stable– <30% throughput variation
Wang, Burns, Terzis. Throughput-Optimized, Global-Scale Join Processing…
Join Scheduling Assumptions
– Accurate cardinality estimates– Perfect join selectivity– Ignore the effect of attribute aggregation
Simplify one aspect of optimization (selectivity) in order to consider non-uniform, non-metric networks
– cannot use Dynamic Programming in this environment as it lacks sub-problem optimality
Two algorithms based on Minimum Spanning Trees– Two-approximate balanced network utilization– Clustering variant defines computation and utilization trade-offs
Wang, Burns, Terzis. Throughput-Optimized, Global-Scale Join Processing…
Spanning Tree Approximation (STA)
Inputs: pairwise throughputs, site cardinalities, and a node to which we deliver results
– Min: node with lowest cardinality
Wang, Burns, Terzis. Throughput-Optimized, Global-Scale Join Processing…
Spanning Tree Approximation (STA)
Construct a minimum spanning tree
Wang, Burns, Terzis. Throughput-Optimized, Global-Scale Join Processing…
Achieving the Bound From min to sink visiting all sites Cost(STA) 2*cost (MST) 2*OPT Same intuition as 2-approximate Euclidean TSP
– STA can visit each site more than once– Applies to non-metric networks
Wang, Burns, Terzis. Throughput-Optimized, Global-Scale Join Processing…
Heuristic Improvement For paths on which the triangle inequality holds
– Route directly to next unvisited node– 30% improvement in practice
Identify and use metric regions in the network
Wang, Burns, Terzis. Throughput-Optimized, Global-Scale Join Processing…
Clustered-STA Well-connected clusters separated by narrow, long-haul paths Optimize for computation inside clusters (count *) Optimize balanced network utilization among clusters (STA)
Wang, Burns, Terzis. Throughput-Optimized, Global-Scale Join Processing…
Clustering Sites Organize sites using Bond-Energy Algorithm
– Minimize difference between adjacent elements Extract clusters with a threshold
– 3 Mbps produces 6 clusters for 30 SkyQuery sites Define computation versus utilization tradeoff
– By tuning the extraction threshold
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Wang, Burns, Terzis. Throughput-Optimized, Global-Scale Join Processing…
Network Utilization Results are independent of assumptions
– OPT is best serial plan STA often finds OPT plan
C-STA performs poorly within clusters
– Also poor on narrow paths due to attribute aggregation
Wang, Burns, Terzis. Throughput-Optimized, Global-Scale Join Processing…
Computation Time count * represents a “soft” lower bound C-STA reduces computation costs
Wang, Burns, Terzis. Throughput-Optimized, Global-Scale Join Processing…
Discussion Balanced network utilization metric captures path heterogeneity
– Avoids narrow, long-haul paths Scheduling algorithms of low complexity
– OPT is a viable alternative for serial plans Limitations of C-STA
– Does not really create meaningful utilization/computation tradeoffs Threshold can only find natural clusters
– Systematically aggregate attributes in each cluster– Semi-joins address these limitations
Extending this work to parallel schedules Applicability to other workloads? OLAP?
Wang, Burns, Terzis. Throughput-Optimized, Global-Scale Join Processing…
A World-Wide Telescope Federations of sky surveys make the world’s best telescope
– whole sky coverage– multi-spectral (optical, radio, infrared, x-ray) – data are always available (no clouds, no moon, day or night)
Multi-spectral and temporal experiments have already lead to many new discoveries
Wang, Burns, Terzis. Throughput-Optimized, Global-Scale Join Processing…
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
The Crossmatch Query
SELECT O.object_id, O.right_accession, T.object_id FROM SDSS:Photo_Object O, TWOMASS:Photo_Primary T, FIRST:Primary_Object PWHERE AREA (185.0,-0.5,4.5) AND XMATCH (O,T,P) <3.5 AND O.type= GALAXY AND (O.i_flux - T.i_flux)>2}