meeting service level objectives of pig programs

21
Meeting Service Level Objectives of Pig Programs Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard Labs

Upload: britain

Post on 24-Feb-2016

31 views

Category:

Documents


0 download

DESCRIPTION

Meeting Service Level Objectives of Pig Programs. Zhuoyao Zhang, Ludmila Cherkasova, Abhishek Verma, Boon Thau Loo University of Pennsylvania Hewlett-Packard Labs. Advantages Large amount of resources Elasticity Pay-as-you-go pricing model Challenges Distributed resources - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Meeting Service Level Objectives  of Pig Programs

Meeting Service Level Objectives of Pig Programs

Zhuoyao Zhang, Ludmila Cherkasova,

Abhishek Verma, Boon Thau Loo

University of PennsylvaniaHewlett-Packard Labs

Page 2: Meeting Service Level Objectives  of Pig Programs

Cloud Environment•Advantages

▫Large amount of resources▫Elasticity ▫Pay-as-you-go pricing model

•Challenges▫Distributed resources▫Error-prone

Page 3: Meeting Service Level Objectives  of Pig Programs

MapReduce and Pig•MapReduce: Simple and fault tolerant

framework for data processing in the cloud

•Pig▫Advanced MapReduce based platform▫Widely used: Yahoo!, Twitter, LinkedIn▫PigLatin: A high-level declaratice language

for expressing data analysis tasks as Pig programsj

1

j2

j3

j4

j5

j6

j7

Page 4: Meeting Service Level Objectives  of Pig Programs

Motivation•Latency-sensitive applications

▫Personalized advertising▫Spam and fraud detection▫Real-time log analysis

•How much resource does an application need to meet their deadlines?

Page 5: Meeting Service Level Objectives  of Pig Programs

Contributions•Performance modeling for Pig programs▫Given a Pig grogram, estimates its

completion time as a function of assigned resource

•Deadline driven resource allocation estimates for Pig programs▫Given a completion time target,

determine the amount of resources for a Pig program to achieve it

Page 6: Meeting Service Level Objectives  of Pig Programs

Outline•Introduction•Building block

▫Performance model for single MapReduce jobs

•Resource allocation for Pig programs

•Evaluation•Conclusion and ongoing work

Page 7: Meeting Service Level Objectives  of Pig Programs

Theoretical Makespan Bounds•Bounds- based makespan estimates

▫n tasks, k servers▫avg: average duration of the n tasks▫max: maximum duration of the n tasks

•Lower bound

•Upper boundknavgTlow

max)1(

knavgTup

Page 8: Meeting Service Level Objectives  of Pig Programs

IllustrationSchedule 1: 1 4 3 2 3 1 2

Schedule 2: 3 1 2 3 2 1 4

Makespan = 4Lower bound =

4

Makespan = 7Upper bound =

8

1

2

4

3

1

2

4

3

Page 9: Meeting Service Level Objectives  of Pig Programs

•Estimate the bounds of the job completion time based on job profile▫Most production jobs are executed

routinely on new data sets▫Job profile based on previous running

Map stage: Mavg, Mmax, AvgInputSize, Selectivity Reduce stage: Shavg, Shmax, Ravg, Rmax, Selectivity

▫Predict the completion time for future running with the profile

Estimate Completion Time for Single MR Job

Page 10: Meeting Service Level Objectives  of Pig Programs

•Estimating bounds on the duration of map and reduce stages

•Map stage duration depends on:▫NM -- the number of map tasks▫SM -- the number of map slots

•Reduce stage duration depends on:▫NR -- the number of reduce tasks▫SR -- the number of reduce slots

•Job duration TJlow , TJ

up , Tjavg

▫ Sum of the map and reduce stage duration10

max)1(

MSNMT

SNMT

M

Mavg

upM

M

Mavg

lowM

Estimate Completion Time for Single MR Job

Page 11: Meeting Service Level Objectives  of Pig Programs

•Given a deadline D and the job profile, find the minimal resource to complete the job within D

Resource Allocation for Single MR Job

Given number of map/reduce tasks

Find the value of SMJ, SR

J with minimum value of SM

J+ SRJ using Lagrange's multipliers

Statistics from job profile

Page 12: Meeting Service Level Objectives  of Pig Programs

Outline•Introduction•Building block

▫Performance model for single MapReduce jobs

•Resource allocation for Pig programs

•Evaluation•Conclusion and ongoing work

Page 13: Meeting Service Level Objectives  of Pig Programs

Performance Model for Pig Programs•Let P = {J1, J2,….JN } , extract the job

profile of each job contained in P▫Assign unique name for each job within a

program•The program completion time sum of

the completion time of all the jobs contained in P

Ni iP TT

1

Page 14: Meeting Service Level Objectives  of Pig Programs

•Possible strategy: find out an appropriate pair of map and reduce slots for each job in the program

•Problem: difficult to implement and manage by the scheduler

NNN

R

N

N

M

N

RM

RM

dC SB

SA

dC SB

SA

dC SB

SA

222

2

2

2

111

1

1

1

Dd

Ni i 1

Resource Allocation for Pig Programs

with

Page 15: Meeting Service Level Objectives  of Pig Programs

Resource Allocation for Pig Programs•A simpler and more elegant solution

▫Allocate the same set of resource to the entire program instead of to each job

•Rewrite the previous equations into

DSS

TNi

NiNiiP

R

iPM

iP C

BA

1

11

Find the minimum set of map and reduce slots

( SMP , SR

P ) for the entire Pig program

Page 16: Meeting Service Level Objectives  of Pig Programs

Experiment Setup•66 nodes cluster in 2 racks

▫4 AMD 2.39GHz cores▫8 GB RAM, ▫two 160GB hard disks

•Configuration▫1 jobtracker, 1 namenode, 64 worker

nodes▫2 map slots and 1 reduce slot for each

node

Page 17: Meeting Service Level Objectives  of Pig Programs

Benchmark•Pigmix benchmark

▫17 programs▫8 tables as the input data

•Dataset▫Test dataset

Generated with the Pig mix data generator Total size around 1TB.

▫Experimental dataset Same layout as the test dataset 20% larger in size

Page 18: Meeting Service Level Objectives  of Pig Programs

Model Accuracy•How well of our performance model

captures Pig program completion time?

Normalized results for predicted and measured completion time

Page 19: Meeting Service Level Objectives  of Pig Programs

Meeting Deadlines•Are we meeting deadlines with our

resource allocation mode?

Pigmix executed on experimental data set : do we meet deadlines?

Page 20: Meeting Service Level Objectives  of Pig Programs

Conclusion•Conclusion

▫The performance model can accurately estimate the completion time of MapReduce workflow

▫Enables automatic resource provisioning for MapReduce workflow with deadlines

•Ongoing work▫Refine the performance model for workflow with

concurrent jobs▫Incorporating failure scenarios in the current

model

Page 21: Meeting Service Level Objectives  of Pig Programs

Thank you