resource aware scheduling for hadoop [final presentation]

Post on 25-May-2015

2.349 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Presentation slides I used for the final presentation of my final year project

TRANSCRIPT

RESOURCE‐AWARESCHEDULINGFORHADOOP

LuWei

ProjectNo:H064420

Supervisor:ProfessorTanKian‐Lee

Na>onalUniversityofSingaporeSchoolofCompu>ng

DepartmentofInforma>onSystems

1

MapReduce&Hadoop

2

MapReduce

•  DistributeddataprocessingframeworkbyGoogle

•  Job– Mapfunc>on– Reducefunc>on

3

HadoopArchitecture

4

Exis>ngSchedulers

5

EarlySchedulers

•  FIFO:MapReducedefault,byGoogle– Prioritylevel&submission>me– Datalocality– Problem:starva>onofotherjobsinpresenceofalongrunningjob

•  HadoopOnDemand(HOD):byYahoo!– Fairness:Sta>cnodealloca>onusingTorqueResourceManager

– Problem:Poordatalocality&underu>liza>on

6

MainstreamSchedulers

•  FairScheduler:byFacebook– Fairness:dynamicresourceredistribu>on

– Challenges:• datalocality–solvedwithdelayedscheduling• Reduce/mapdependence–solvedwithcopy‐computesplibng

•  CapacityScheduler:byYahoo!– SimilartoFairScheduler– Specialsupportformemoryintensivejobs

7

Alterna>veSchedulers

•  Adap>veScheduler(2010‐2011)– Goal/deadlineorientated– Adap>velyestablishpredic>onsbyjobmatching

– Problem:strongassump>ons&ques>onableperformance

•  MachineLearningApproach(2010)– NaïveBayes&Proceptronwiththeaidofuserhints– BeferperformancethanFIFO

– Underu>liza>onduringlearningphase&Overhead

8

Exis>ngSchedulersScheduler Pro Con Resource‐Awareness

FIFO Highthroughput Starva>onofshortjobs

Datalocality

HOD Sharingofcluster Poordatalocality&underu>liza>on

FairScheduler Fairness&dynamicresourcere‐alloca>on

Complicatedconfigura>on

DatalocalityCopy‐computesplibng

CapacityScheduler SimilartoFS SimilartoFS Specialsupportformemoryintensivejobs

Adap>veScheduler Adap>veapproach Strongassump>ons&ques>onableperformance

Resourceu>liza>oncontrolusingjobmatching

MachineLearning ReportedbeferperformancethanFIFO

Underu>liza>onduringlearningphase&overhead

Resourceu>liza>oncontrolusingpafernclassifica>on

9

Mo>va>ons

•  HeterogeneitybyConfigura>on– Hardwarecapacitydifferencesamongacluster

•  HeterogeneitybyUsage– Alltaskslotsaretreatedequallywithoutconsidera>onsofresourcestatusofcurrentnodeorresourcedemandofqueuingjobs

– PossiblethataCPUbusynodeisassignedaCPUintensivejob;andanI/ObusynodeassignedanI/Ointensivejob

10

Resource‐AwareScheduler

11

DesignOverview

1.  Capture–  thejob’sresourcedemandcharacteris>cs–  theTaskTracker’ssta>ccapability&run>me

usagestatus

2.  CombineandTransformintoquan>fiedmeasurements

3.  PredicthowfastagivenTaskTrackerisexpectedtofinishagiventask

4.  Applyschedulingpolicyofchoice12

DesignDetails

•  TaskTrackerProfiling– Resourcescores:representavailability– Sampledeverysecond(ateveryheartbeat)foreachTaskTracker

13

DesignDetails

•  TaskBasedJobSampling–  Assump>on:

–  Targetmeasurements:

–  Technique:•  Periodicalre‐sampling:avoidover‐relianceononejobsample

tsample = ts−cpu + ts−disk + ts−network

Taskresourcedemand

TaskTrackerresourcestatuses

14

•  TaskProcessingTimeEs>ma>on

DesignDetails

testimate = te−cpu + te−disk + te−network

testimate = ts−cpu ×cs−cpuccpu

+ te−disk−in + te−disk−out + te−disk−spill + te−network−in + te−network−out

te−disk−in = ts−disk−in ×cs−disk−readcdisk−read

×sdisk−inss−disk−in

sdisk−spill =ss−disk−spillSs−in

× sin

snetwork−out =sout

Ntotal−reduce

=βs−oi−ratio × sinNtotal−reduce

15

•  Schedulingpolicies– MapTasks• ShortestJobFirst(SJF)• Starva>onoflongrunningjobs:addressedbyperiodicalre‐sampling

– ReduceTasks• NaïveI/OBiasing

– DonotscheduleI/OintensivejobonI/ObusynodewhenthereareotherreduceslotswithhigherdiskI/Oavailability

– I/Ointensivejob:judgedusingmapphasesample– I/Obusynode:diskI/Oscoresbelowclusteraverage

DesignDetails

16

Implementa>on

ResourceScheduler

MapTaskFinishTimeEs>mator

MapSampleReportLogger

HashMap<JobID,MapSampleReport>

JobTrackerTaskTracker

TaskTrackerStatus

ResourceStatus

ResourceCalculatorPlugin

TaskInProgress

Task

MyJobInProgressJobInProgress

TaskStatus

SampleTaskStatus

Jobprofiles

ResourceProfiles

ResourceScores Sampletaskprocessing>me&datasizes

Es>matedtaskprocessing>me

hfps://github.com/weilu/Hadoop‐Resource‐Aware‐Scheduler 17

Evalua>on&Results

18

Es>ma>onAccuracy

•  ClusterConfigura>onI–  Sharedwithotherusersandotherapplica>ons–  1master,10slavenodes–  1Gbpsnetwork,samerack–  Eachnode:

•  4processors:IntelXeonE5607QuadCoreCPU(2.26GHz),•  32GBmemory,and•  1TBharddisk

•  HadoopConfigura>on–  HDFSblocksize:64MB–  Datareplica>on:1–  Eachnode:

•  Mapslots:1•  Reduceslots:2

–  Specula>vemap&reducetasks:off–  Completedmapsrequiredbeforeschedulingreduce:1outof1000totalmaps

19

Es>ma>onAccuracy

•  Workloaddescrip>on:–  I/Oworkload:wordcount

•  Countstheoccurrenceofeachwordingiveninputfiles•  Mapper:Scansthroughtheinput;outputseachwordwithitselfasthekeyand1asthevalue,sortedonthekeyvalue.

•  Reducer:Collectsthosewiththesamekeybyaddingupthevalue;outputsthekeyandtotaloccurrence

– CPUworkload:pies>ma>on•  Approximatethevalueofpibycoun>ngthenumberofpointsthatfallwithintheunitquartercircle

•  Mapper:Readscoordinatesofpoints;countspointsinside/outsideoftheinscribedcircleofthesquare.

•  Reducer:Accumulatesnumbersofpointsinside/outsideresultsfromthemappers

20

Es>ma>onAccuracy

•  I/OWorkload1

0

20000

40000

60000

80000

100000

120000

140000

160000

Es?matedvs.ActualTaskExecu?onTime(ResourceScheduler,wordcount,10node,5Gindata,singlejob)

es>mate actual

21

Es>ma>onAccuracy

•  I/OWorkload2

20000

25000

30000

35000

40000

45000

Es?matedvs.ActualTaskExecu?onTime(ResourceScheduler,wordcount,10node,5Gindata,singlejob)

es>mate actual

22

Es>ma>onAccuracy

•  CPUWorkload1

0

1000

2000

3000

4000

5000

6000

ResourceSchedulerpi(10node,100maps,108pointseach,Singlejob)

es>mated actual

23

Es>ma>onAccuracy

•  CPUWorkload2

0

5000

10000

15000

20000

25000

30000

35000

40000

45000

50000

ResourceSchedulerpi(10node,100maps,109pointseach,Singlejob)

es>mated actual

24

•  ClusterConfigura>onII(DifftoConfigura>onI)– Reservedandunshared– 1master,5slavenodes

•  WorkloadDescrip>on– SingleI/Ojob:wordcount– SingleCPUjob:pies>ma>on– SimultaneoussubmissionofI/OjobandCPUjob

PerformanceBenchmark:ResourceSchedulervs.FIFOScheduler

OverheadEvalua>on

Baselineestablishment:realitytest

25

PerformanceBenchmark:ResourceSchedulervs.FIFOSchedulerResource‐HomogeneousEnvironment•  OverheadEvalua>on

Table9–evalua?onandresults:wordcountinresource‐homogeneousenvironment3runs(summary)

Table10–evalua?onandresults:pies?ma?oninresource‐homogeneousenvironment3runs(summary)

26

•  FIFOvsResourceSchedulerinaResource‐HomogeneousEnvironment

PerformanceBenchmark:ResourceSchedulervs.FIFOScheduler

27

PerformanceBenchmark:ResourceSchedulervs.FIFOScheduler•  Analysis

–  Negligibleoverhead–  ResourceSchedulerperforms

worse:slowdowninallmeasureddimensionsandcase

–  Reason:Resourceschedulerhasmoreconcurrentrunningreducerscompe>ngforresources

–  Expect:Sameperformanceinabusycluster(allreduceslotsareconstantlyfilledwithrunningtasks)

1200

1250

1300

1350

1400

1450

1500

1550

1600

1650

1700

FIFO Resource FIFO Resource

totalmap>me(sec) totaljob>me(sec)

FIFOvsResourceSchedulerinaResource‐HomogeneousEnvironment

(SimultaneoussubmissionofanI/OjobandaCPUjob)

worst

average

best

28

PerformanceBenchmark:ResourceSchedulervs.FIFOSchedulerResource‐HeterogeneousEnvironment

•  EnvironmentSimula>on– CPUinterven>on:Non‐MapReducePies>ma>on– DiskI/Ointerven>on:dd50Gwrite‐read

•  SimulatedEnvironment– 3CPUbusynodes+2DiskIObusynodes 29

•  FIFOvsResourceSchedulerinaResource‐HeterogeneousEnvironment(Sequen>alsubmissionof2jobs)

PerformanceBenchmark:ResourceSchedulervs.FIFOScheduler

30

•  FIFOvsResourceSchedulerinaResource‐HeterogeneousEnvironment(Concurrentsubmissionof2jobs)

PerformanceBenchmark:ResourceSchedulervs.FIFOScheduler

31

PerformanceBenchmark:ResourceSchedulervs.FIFOScheduler

1200

1350

1500

1650

1800

1950

2100

2250

2400

2550

2700

FIFO Resource FIFO Resource

Totalmap>me(sec) Totaljob>me(sec)

FIFOvsResourceSchedulerinaResource‐HeterogeneousEnvironment

(SimultaneoussubmissionofanI/OjobandaCPUjob)

worst

average

best

0.00%2.00%4.00%6.00%8.00%10.00%12.00%14.00%16.00%

Best Average Worst

Totalmap?mepercentageslowdownofresourcetoFIFOscheduler

homogenousenvironment

heterogenousenvironment

‐4.00%‐2.00%0.00%2.00%4.00%6.00%8.00%10.00%12.00%14.00%16.00%18.00%20.00%

Best Average Worst

Totaljob?mepercentageslowdownofresourcetoFIFOscheduler

homogenousenvironment

heterogenousenvironment

32

Conclusion

•  Resourcebasedmaptaskprocessing>mees>ma>onissa>sfactory•  ResourceschedulerdidnotmanagetooutperformFIFOscheduler

inresource‐homogenousenvironmentandmostcasesofresourceheterogeneousenvironmentduetoextraconcurrentreducetasks

•  Howeverweverifiedthatresourceschedulerisindeedresourceaware–itperformsbeferwhenmovedfromaresource‐homogeneousenvironmenttoaresource‐heterogeneousenvironment:–  SmallerpercentageslowdowncomparedtoFIFOinallcasesandall

measureddimensions–  ObservedspeedupcomparedtoFIFOinworsecasesduetoI/Obiasing

schedulingduringreducestage

33

Recommenda>onsforFutureWork

•  Evalua>on– Heavierworkload&busycluster

• Observeoverhead•  Benchmarkperformance

•  Schedulingpolicy– MapTask

•  HighestResponseRa>oNext(HRRN)

– ReduceTask•  CPUBiasingforCPUintensivejobs

priority =testimated + twaiting

testimated= 1+

twaitingtestimated

34

top related