low latency geo-distributed data analytics qifan pu, ganesh ananthanarayanan, peter bodik, srikanth...

20
Low Latency Geo- distributed Data Analytics Qifan Pu, Ganesh Ananthanarayanan, Peter Bodik, Srikanth Kandula, Aditya Akella, Paramvir Bahl, Ion Stoica

Upload: daniel-oliver

Post on 03-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Low Latency Geo-distributed Data Analytics Qifan Pu, Ganesh Ananthanarayanan, Peter Bodik, Srikanth Kandula, Aditya Akella, Paramvir Bahl, Ion Stoica

Low Latency Geo-distributed Data Analytics

Qifan Pu, Ganesh Ananthanarayanan, Peter Bodik, Srikanth Kandula, Aditya Akella, Paramvir Bahl, Ion Stoica

Page 2: Low Latency Geo-distributed Data Analytics Qifan Pu, Ganesh Ananthanarayanan, Peter Bodik, Srikanth Kandula, Aditya Akella, Paramvir Bahl, Ion Stoica

2

WAN

Geo-distributed Data Analytics

Seattle

Berkeley Beijing

London

Slow & Wasteful

Perf. countersUser activities

“Centralized” Data Analytics Paradigm

Page 3: Low Latency Geo-distributed Data Analytics Qifan Pu, Ganesh Ananthanarayanan, Peter Bodik, Srikanth Kandula, Aditya Akella, Paramvir Bahl, Ion Stoica

3

WANSeattle

Berkeley Beijing

London

A single logical analytics cluster across all sites.

Page 4: Low Latency Geo-distributed Data Analytics Qifan Pu, Ganesh Ananthanarayanan, Peter Bodik, Srikanth Kandula, Aditya Akella, Paramvir Bahl, Ion Stoica

44

WANSeattle

Berkeley Beijing

London

Incorporating WAN bandwidths is key to geo-distributed analytics performance.

A single logical analytics system across all sites.

Page 5: Low Latency Geo-distributed Data Analytics Qifan Pu, Ganesh Ananthanarayanan, Peter Bodik, Srikanth Kandula, Aditya Akella, Paramvir Bahl, Ion Stoica

5

Incorporating WAN bandwidths

• Task placement–Decides the destinations of network transfers

• Data placement–Decides the sources of network transfers

Page 6: Low Latency Geo-distributed Data Analytics Qifan Pu, Ganesh Ananthanarayanan, Peter Bodik, Srikanth Kandula, Aditya Akella, Paramvir Bahl, Ion Stoica

Example Analytics Job

SELECT time_window, percentile(latency, 99) GROUP BY time_window

Seattle 40GB 20GB

20GBLondon 40GB

800MB/s

200MB/s

WAN

Page 7: Low Latency Geo-distributed Data Analytics Qifan Pu, Ganesh Ananthanarayanan, Peter Bodik, Srikanth Kandula, Aditya Akella, Paramvir Bahl, Ion Stoica

0

30

60

90

120

150

180

0

0.2

0.4

0.6

0.8

1

0

10

20

30

40

50

60

Task

Fractions

Upload

Time (s)

Download

Time (s)Input Data

(GB)

Calculating Transfer TimeSeattle London

0.5 0.5

40GB40GB 12.5s 12.5s 12.5s

50s

0.2

0.8

20s 20s 20s

2.5s

2.5x

How to solve the general case, with more sites, BW heterogeneity and

data skew?

Seattle

40 20

20London

40

Page 8: Low Latency Geo-distributed Data Analytics Qifan Pu, Ganesh Ananthanarayanan, Peter Bodik, Srikanth Kandula, Aditya Akella, Paramvir Bahl, Ion Stoica

Task Placement (TP Solver)

Task 1 -> LondonTask 2 -> BeijingTask 5 -> London…

Sites MTasks N

Data Matrix (MxN)Upload BWs

Download BWs

8

TPSolver

Optimization Goal: Minimize the longest transfer of all links

Page 9: Low Latency Geo-distributed Data Analytics Qifan Pu, Ganesh Ananthanarayanan, Peter Bodik, Srikanth Kandula, Aditya Akella, Paramvir Bahl, Ion Stoica

0

30

60

90

120

150

180

0

0.2

0.4

0.6

0.8

1

0

10

20

30

40

50

60

Task

Fractions

Upload

Time (s)

Download

Time (s)Input Data

(GB)

London

0.2

0.8

Seattle

100GB 100GB

50s 50s

6.25s40GB

160GB

0.07

0.93

24s 24s 24s

6s

2x

50s

How to jointly optimize data and task placement?

Seattle

100 50

50London

100

Another example

Query Lag

Page 10: Low Latency Geo-distributed Data Analytics Qifan Pu, Ganesh Ananthanarayanan, Peter Bodik, Srikanth Kandula, Aditya Akella, Paramvir Bahl, Ion Stoica

10

Iridium

Jointly optimize data and task placementwith greedy heuristic

improve query response time

bandwidth, query arrivals, etc

Approach

Goal Constraints

Page 11: Low Latency Geo-distributed Data Analytics Qifan Pu, Ganesh Ananthanarayanan, Peter Bodik, Srikanth Kandula, Aditya Akella, Paramvir Bahl, Ion Stoica

11

Iridium with Single Dataset

Iterative heuristics for joint task-data placement.

1, Identify bottlenecksby solving task placement

2, assess:find amount of move data to alleviate current bottleneck

TPSolver

TPSolver

Until query arrivals, repeat.

Page 12: Low Latency Geo-distributed Data Analytics Qifan Pu, Ganesh Ananthanarayanan, Peter Bodik, Srikanth Kandula, Aditya Akella, Paramvir Bahl, Ion Stoica

12

Iridium with Multiple Datasets

• Prioritize high-value datasets:

score = value x urgency / cost - value = sum(timeReduction) for all queries - urgency = 1/avg(query_lag) - cost = amount of data moved

Page 13: Low Latency Geo-distributed Data Analytics Qifan Pu, Ganesh Ananthanarayanan, Peter Bodik, Srikanth Kandula, Aditya Akella, Paramvir Bahl, Ion Stoica

13

Iridium: putting together

• Placement of data– Before query arrival

– prioritize the move of high-value datasets

• Placement of tasks– During query execution:

– constrained solver TP

Solver

Not talked about: estimation of query arrivals, contention of move&query, etc

Page 14: Low Latency Geo-distributed Data Analytics Qifan Pu, Ganesh Ananthanarayanan, Peter Bodik, Srikanth Kandula, Aditya Akella, Paramvir Bahl, Ion Stoica

14

Evaluation

• Spark 1.1.0 and HDFS 2.4.1– Override Spark’s task scheduler with ours– Data placement creates copies in cross-site HDFS

• Geo-distributed EC2 deployment across 8 regions– Tokyo, Singapore, Sydney, Frankfurt, Ireland,

Sao Paulo, Virginia (US) and California (US).

Page 15: Low Latency Geo-distributed Data Analytics Qifan Pu, Ganesh Ananthanarayanan, Peter Bodik, Srikanth Kandula, Aditya Akella, Paramvir Bahl, Ion Stoica

15

• Spark jobs, SQL queries and streaming queries

– Conviva: video sessions paramters

– Bing Edge: running dashboard, streaming

– TPC-DS: decision support queries for retail

– AMP BDB: mix of Hive and Spark queries

• Baseline:– “In-place”: Leave data unmoved + Spark’s scheduling– “Centralized”: aggregate all data onto one site

How well does Iridium perform?

Page 16: Low Latency Geo-distributed Data Analytics Qifan Pu, Ganesh Ananthanarayanan, Peter Bodik, Srikanth Kandula, Aditya Akella, Paramvir Bahl, Ion Stoica

16

Iridium outperforms 4x-19x3x-4x

Conviva Bing-Edge TPC-DS Big-Data

vs. In-place

vs. Centralized

0

20

40

60

80

100 10x

19x7x

4x

Red

uct

ion (

%)

in

Query

Resp

onse

Tim

e

3x4x4

x3x

Page 17: Low Latency Geo-distributed Data Analytics Qifan Pu, Ganesh Ananthanarayanan, Peter Bodik, Srikanth Kandula, Aditya Akella, Paramvir Bahl, Ion Stoica

Iridium subsumes both baselines!

vs. Centralized: Data placement has higher contributionvs. In-place:Equal contributions from two techniques

MedianReduction (%) Vs. Centralized Vs. In-place

Task placementData placementIridium (both)

18%38%75%

24%30%63%

Page 18: Low Latency Geo-distributed Data Analytics Qifan Pu, Ganesh Ananthanarayanan, Peter Bodik, Srikanth Kandula, Aditya Akella, Paramvir Bahl, Ion Stoica

0 20 40 60 800

20

40

60

80

100

IridiumMinBW

Reduct

ion (

%)

in

Query

Resp

onse

Tim

e

Reduction (%) in WAN Usage

1.5xBmin 1.3xBmin

1xBmin

(64%, 19%)

better

MinBW: a scheme that minimizes bandwidth, to BminIridium: budget the bandwidth usage to be m*BminIridium can speed up queries while using

near-optimal bandwidth cost

Bandwidth Cost

Page 19: Low Latency Geo-distributed Data Analytics Qifan Pu, Ganesh Ananthanarayanan, Peter Bodik, Srikanth Kandula, Aditya Akella, Paramvir Bahl, Ion Stoica

19

Related work

• JetStream (NSDI’14)– Data aggregation and adaptive filtering– Does not support arbitrary queries, nor optimizes

task and data placement

• WANalytics (CIDR’15), Geode (NSDI’15)– Optimize BW usage for SQL & general DAG jobs– Can lead to poor query performance time

Page 20: Low Latency Geo-distributed Data Analytics Qifan Pu, Ganesh Ananthanarayanan, Peter Bodik, Srikanth Kandula, Aditya Akella, Paramvir Bahl, Ion Stoica

20

Low Latency Geo-distributed Data Analytics

Data is geographically distributed• Services with global footprintsAnalyze logs across DCs• “99 percentile movie rating”• “Median Skype call setup latency”

Abstraction: Single logical analytics cluster across all sites Incorporating WAN bandwidths Reduce response time over baselines by 3x – 19x

WANSeattle

Berkeley Beijing

London