presented by nirupam roy starfish: a self-tuning system for big data analytics herodotos herodotou,...

26
Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong, Fatma Bilgen Cetin, Shivnath Babu Department of Computer Science Duke University

Upload: tracy-oneal

Post on 18-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,

Presented by Nirupam Roy

Starfish: A Self-tuning System for Big Data Analytics

Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong, Fatma Bilgen Cetin, Shivnath Babu

Department of Computer ScienceDuke University

Page 2: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,

The Growth of Data

Page 3: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,

MAD: Features of Ideal Analytics System

Magnetism

Agility

Depth

-- accept all data

-- allow complex analysis

-- adapt with data, real-time processing

Page 4: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,

Magnetism

Agility

Depth

-- accept all data

-- adapt with data, real-time processing

-- allow complex analysis

Hadoop is MAD

- Blindly loads data into HDFS.

- Fine-grained scheduler- End-to-end data pipeline- Dynamic node addition/ dropping

- Well integrated with programming languages

Page 5: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,

Tuning for Good Performance: Challenges

- Multiple dimensions of performance-- time, cost, scalability …

- Tons of Parameters-- more than 190 parameters in Hadoop.

- Multiple levels of abstraction-- job-level, workflow-level, workload-level …

Page 6: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,

Thumb rule

Tuning for Good Performance: Challenges

Page 7: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,

Thumb rule

Tuning for Good Performance: Challenges

Page 8: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,

Starfish: A Self-tuning System

- Builds on Hadoop- Tunes to ‘good’ performance automatically

Page 9: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,

Starfish Architecture

Page 10: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,

The “What-if” Engine

Model + simulation based prediction algo.

Predicted performance

Learning from previous job

profiles

Analytical models to estimate

dataflow

Simulating the execution of MR

workload

Profile of a job (P)

+New

parameter set (S)

[Ref:] A What-if Engine for Cost-based MapReduce Optimization. H. Herodotou et.al.

Page 11: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,

The “What-if” Engine

Ground truth Estimated by the What-if engine

Page 12: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,

Starfish Architecture: Job Level

Page 13: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,

Starfish Architecture: Job Level

Just-in-time optimizer-- Searches the parameter space

Profiler-- Collects info. on MapReduce job execution through dynamic instrumentation-- Reports timings, data size, and resource utilizationSampler-- Generates profile statistics from training benchmark jobs

Page 14: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,

Starfish Architecture: Workflow Level

Page 15: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,

Starfish Architecture: Workflow Level

Scheduler to balanced distribution of data

Block placement policy for data collocation

-- deals with skewed data, add/drop of nodes, tradeoff between balanced data v/s data-locality

-- Local-write v/s round-robin

Page 16: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,

Starfish Architecture: Workflow Level

Producer

Consumer

Wasted production

Page 17: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,

Starfish Architecture: Workflow Level

File level parallelism

Block level parallelism

Page 18: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,

Starfish Architecture: Workflow Level

What-if simulation

Workflow Aware OptimizerSelect best data layout and job parameters

• MR job execution• Task scheduling• Block placement

Compare cost & benefits

Running time?

Data layout?

Page 19: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,

Starfish Architecture: Workload Level

Page 20: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,

Starfish Architecture: Workload Level

Workload Optimizer

Elastisizer • Determine best cluster and Hadoop configurations

• Jumbo operator• Cost based estimation for

best optimization

Page 21: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,

Starfish: Summary

- Optimizes on different granularities-- Workload, workflow, job (procedural & declarative)

- Considers different decision points-- Provisioning, optimization, Scheduling, Data layout

Page 22: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,

Starfish: Piazza Discussion

1) Limited evaluation: 10

Top criticisms (till 1:30pm, 17 reviews):

2) Not explained well: 7 3) Profiler overhead/better search algo: 5

* What is the effect of wrong prediction?

* What-if engine requires prior knowledge.

Page 23: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,

http://www.cs.duke.edu/starfish/

Thank you.

Photo courtesy: Starfish group, Duke University

Page 24: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,

Going MAD with Big Data

Magnetic system

Agile system and Analytics

Deep Analytics

Data Life Cycle Awareness

Elasticity

Robustness

Page 25: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,

Backup: What-if Engine 1

Page 26: Presented by Nirupam Roy Starfish: A Self-tuning System for Big Data Analytics Herodotos Herodotou, Harold Lim, Gang Luo, Nedyalko Borisov, Liang Dong,

Backup: What-if Engine 2