table of contentstable of contents overview scheduling in hadoop heterogeneity in hadoop the...

37
SAMR: A Self-adaptive MapReduce Scheduling Algorithm In Heterogeneous Environment Quan Chen Daqiang Zhang Minyi Guo Qianni Deng Department of Computer Science Shanghai Jiao Tong University, Shanghai, China Song Guo School of Computer Science and Engineering, The University of Aizu, Japan Presented by Xiaoyu Sun Authors

Upload: ursula-underwood

Post on 23-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Table of ContentsTable of Contents  Overview  Scheduling in Hadoop  Heterogeneity in Hadoop  The LATE Scheduler(Longest Approximate Time to End)

SAMR: A Self-adaptive MapReduce Scheduling Algorithm

In Heterogeneous Environment

Quan Chen Daqiang Zhang Minyi

Guo Qianni DengDepartment of Computer Science

Shanghai Jiao Tong University, Shanghai, China

Song GuoSchool of Computer Science and

Engineering,The University of Aizu, Japan

Presented by Xiaoyu Sun

Authors

Page 2: Table of ContentsTable of Contents  Overview  Scheduling in Hadoop  Heterogeneity in Hadoop  The LATE Scheduler(Longest Approximate Time to End)

Table of Contents

Overview

Scheduling in Hadoop

Heterogeneity in Hadoop

The LATE Scheduler(Longest Approximate Time to End)

The SAMR(A Self-adaptive MapReduce Scheduling Algorithm) Scheduler

Experiment

Conclusion

Page 3: Table of ContentsTable of Contents  Overview  Scheduling in Hadoop  Heterogeneity in Hadoop  The LATE Scheduler(Longest Approximate Time to End)

Overview User

Program

Worker

Worker

Master

Worker

Worker

Worker

fork fork fork

assignmap

assignreduce

readlocalwrite

remoteread,sort

OutputFile 0

OutputFile 1

write

Split 0Split 1Split 2

Input Data

Page 4: Table of ContentsTable of Contents  Overview  Scheduling in Hadoop  Heterogeneity in Hadoop  The LATE Scheduler(Longest Approximate Time to End)

The Map Step

vk

k v

k v

mapvk

vk

k vmap

Inputkey-value pairs

Intermediatekey-value pairs

k v

Page 5: Table of ContentsTable of Contents  Overview  Scheduling in Hadoop  Heterogeneity in Hadoop  The LATE Scheduler(Longest Approximate Time to End)

The Reduce Step

k v

k v

k v

k v

Intermediatekey-value pairs

group

reduce

reduce

k v

k v

k v

k v

k v

k v v

v v

Key-value groups Output key-value pairs

Page 6: Table of ContentsTable of Contents  Overview  Scheduling in Hadoop  Heterogeneity in Hadoop  The LATE Scheduler(Longest Approximate Time to End)

Overview

Google has noted that speculative execution improves response time by 44%

The paper shows an efficient way to do speculative execution in order to maximize performance

It also shows that Hadoop’s simple speculative algorithm based on comparing each task’s progress to the average progress brakes down in heterogeneous systems

Page 7: Table of ContentsTable of Contents  Overview  Scheduling in Hadoop  Heterogeneity in Hadoop  The LATE Scheduler(Longest Approximate Time to End)

Overview

The proposed scheduling algorithm increases Hadoop’s response time

The paper addresses two important problems in speculative execution: Choosing the best node to run the speculative

task Distinguishing between nodes slightly slower than

the mean and stragglers

Page 8: Table of ContentsTable of Contents  Overview  Scheduling in Hadoop  Heterogeneity in Hadoop  The LATE Scheduler(Longest Approximate Time to End)

Scheduling in Hadoop

Assumptions made by Hadoop Scheduler:

Nodes can perform work at roughly the same rate

Tasks progress at a constant rate throughout time

Page 9: Table of ContentsTable of Contents  Overview  Scheduling in Hadoop  Heterogeneity in Hadoop  The LATE Scheduler(Longest Approximate Time to End)

Scheduling in Hadoop

R1:1/3

• Copy data

R2:1/3

• Order

M1:1

• Execute map function

M2:0

• Reorder intermediate results

Reduce Task

Map Task

Page 10: Table of ContentsTable of Contents  Overview  Scheduling in Hadoop  Heterogeneity in Hadoop  The LATE Scheduler(Longest Approximate Time to End)

Scheduling in Hadoop

Page 11: Table of ContentsTable of Contents  Overview  Scheduling in Hadoop  Heterogeneity in Hadoop  The LATE Scheduler(Longest Approximate Time to End)

Scheduling in Hadoop

• Copy• 1/3

Done• Sort• 1/3

Done• Merge• 1/4

Processing

• Copy• 1/3

Done• Sort• 1/3

Done• Merge• 1/4

Processing

• Copy• 1/3

Done• Sort• 1/5

Done Processing

11/12

11/12

Task1

8/15

Task2

Task3X

If Average PS is 10/15

Page 12: Table of ContentsTable of Contents  Overview  Scheduling in Hadoop  Heterogeneity in Hadoop  The LATE Scheduler(Longest Approximate Time to End)

Scheduling in Hadoop

• Copy• 1/3

Done• Sort• 1/3

Done• Merge• 1/4

Processing

• Copy• 1/3

Done• Sort• 1/3

Done• Merge• 1/4

Processing

• Copy• 1/3

Done• Sort• 1/5

Done• Merge• wating

Processing

20s

Task1

Task2

Task3X

11/12

11/12

60s

8/1540s

Page 13: Table of ContentsTable of Contents  Overview  Scheduling in Hadoop  Heterogeneity in Hadoop  The LATE Scheduler(Longest Approximate Time to End)

Scheduling in Hadoop

• Copy• 1/3

Done• Sort• 1/4

Done• Merge• waiting

Processing

• Copy• 1/3

Done• Sort• 1/12

Done• Merge• wating

Processing

Task1

Task2

7/12

5/12

20s

40s

X

X

Page 14: Table of ContentsTable of Contents  Overview  Scheduling in Hadoop  Heterogeneity in Hadoop  The LATE Scheduler(Longest Approximate Time to End)

Scheduling in Hadoop

• Copy• 1/3

Done• Sort• waiting

Done• Merge• waiting

Processing

• Copy• 1/3

Done• Sort• 1/12

Done• Merge• wating

Processing

Task1

Task2

1/3

5/12

180s

20s

X

Not Data locality

Data locality

Page 15: Table of ContentsTable of Contents  Overview  Scheduling in Hadoop  Heterogeneity in Hadoop  The LATE Scheduler(Longest Approximate Time to End)

The LATE Scheduler

Page 16: Table of ContentsTable of Contents  Overview  Scheduling in Hadoop  Heterogeneity in Hadoop  The LATE Scheduler(Longest Approximate Time to End)

The LATE Scheduler

R1:1/3

• Copy data

R2:1/3

• Order

M1:1

• Execute map function

M2:0

• Reorder intermediate results

Reduce Task

Map Task

Page 17: Table of ContentsTable of Contents  Overview  Scheduling in Hadoop  Heterogeneity in Hadoop  The LATE Scheduler(Longest Approximate Time to End)

The LATE Scheduler

• Copy• 1/3

Done• Sort• 1/3

Done• Merge• 1/4

Processing

• Copy• 1/3

Done• Sort• 1/4

Done• Merge• waiting

Processing

40s

30s

Task1

Task2

X 11/12

7/12

Page 18: Table of ContentsTable of Contents  Overview  Scheduling in Hadoop  Heterogeneity in Hadoop  The LATE Scheduler(Longest Approximate Time to End)

The LATE Scheduler

• Copy• 1/3

Done• Sort• waiting

Done• Merge• waiting

Processing

• Copy• 1/3

Done• Sort• 1/12

Done• Merge• wating

Processing

Task1

Task2

1/3

5/12

180s

20s

X

Not Data locality

Data locality

Page 19: Table of ContentsTable of Contents  Overview  Scheduling in Hadoop  Heterogeneity in Hadoop  The LATE Scheduler(Longest Approximate Time to End)

The LATE Scheduler

In order to get the best chance to beat the original task which was speculated the algorithm launches speculative tasks only on fast nodes

It does this using a SlowNodeThreshold which is a metric of the total work performed

Because speculative tasks cost resources LATE uses two additional heuristics:

A limit on the number of speculative tasks executed (SpeculativeCap)

A SlowTaskThreshold that determines if a task is slow enough in order to get speculated (uses progress rate for comparison)

Page 20: Table of ContentsTable of Contents  Overview  Scheduling in Hadoop  Heterogeneity in Hadoop  The LATE Scheduler(Longest Approximate Time to End)

The SAMR Scheduler

R1: ?

• Copy data

R2:?

• Order

M1:?

• Execute map function

M2:?

• Reorder intermediate results

Reduce Task

Map Task

Page 21: Table of ContentsTable of Contents  Overview  Scheduling in Hadoop  Heterogeneity in Hadoop  The LATE Scheduler(Longest Approximate Time to End)

The SAMR Scheduler

The way to use and update historical information

Page 22: Table of ContentsTable of Contents  Overview  Scheduling in Hadoop  Heterogeneity in Hadoop  The LATE Scheduler(Longest Approximate Time to End)

The SAMR Scheduler

SLOW_TASK_CAP (STaC)

Page 23: Table of ContentsTable of Contents  Overview  Scheduling in Hadoop  Heterogeneity in Hadoop  The LATE Scheduler(Longest Approximate Time to End)

The SAMR Scheduler

SLOW_TRACKER_CAP (STrC)

Page 24: Table of ContentsTable of Contents  Overview  Scheduling in Hadoop  Heterogeneity in Hadoop  The LATE Scheduler(Longest Approximate Time to End)

The SAMR Scheduler

Page 25: Table of ContentsTable of Contents  Overview  Scheduling in Hadoop  Heterogeneity in Hadoop  The LATE Scheduler(Longest Approximate Time to End)

The SAMR Scheduler

SLOW_TRACKER_PRO (STrP)

SlowTrackerNum< STrP*TrackerNum (14)

Page 26: Table of ContentsTable of Contents  Overview  Scheduling in Hadoop  Heterogeneity in Hadoop  The LATE Scheduler(Longest Approximate Time to End)

The SAMR Scheduler

Launching backup tasks

BackupNum <BP(Backup Pro) * TaskNum (15)

Page 27: Table of ContentsTable of Contents  Overview  Scheduling in Hadoop  Heterogeneity in Hadoop  The LATE Scheduler(Longest Approximate Time to End)

The SAMR Scheduler

Page 28: Table of ContentsTable of Contents  Overview  Scheduling in Hadoop  Heterogeneity in Hadoop  The LATE Scheduler(Longest Approximate Time to End)

The SAMR Scheduler

Page 29: Table of ContentsTable of Contents  Overview  Scheduling in Hadoop  Heterogeneity in Hadoop  The LATE Scheduler(Longest Approximate Time to End)

Experiment

Affection of “HP” on the execute time

Page 30: Table of ContentsTable of Contents  Overview  Scheduling in Hadoop  Heterogeneity in Hadoop  The LATE Scheduler(Longest Approximate Time to End)

Experiment

Affection of “STac”,”STrC”, and “STrP” on the execute time

Page 31: Table of ContentsTable of Contents  Overview  Scheduling in Hadoop  Heterogeneity in Hadoop  The LATE Scheduler(Longest Approximate Time to End)

Experiment

Affection of “BP” on the execute time

Page 32: Table of ContentsTable of Contents  Overview  Scheduling in Hadoop  Heterogeneity in Hadoop  The LATE Scheduler(Longest Approximate Time to End)

Experiment

Historical information and Real information on all 8 nodes

Page 33: Table of ContentsTable of Contents  Overview  Scheduling in Hadoop  Heterogeneity in Hadoop  The LATE Scheduler(Longest Approximate Time to End)

Experiment

HP=0.2

STaC=0.3

STrC=0.2

STrP=0.3

and BP=0.2

Page 34: Table of ContentsTable of Contents  Overview  Scheduling in Hadoop  Heterogeneity in Hadoop  The LATE Scheduler(Longest Approximate Time to End)

Experiment

The execute results of “Sort” running on the experiment platform.

Page 35: Table of ContentsTable of Contents  Overview  Scheduling in Hadoop  Heterogeneity in Hadoop  The LATE Scheduler(Longest Approximate Time to End)

Experiment

LATE decreases about 7% execute time

LATE using historical information decrease about 15% execute time

SAMR decreases about 24% execute time compared to Hadoop

Page 36: Table of ContentsTable of Contents  Overview  Scheduling in Hadoop  Heterogeneity in Hadoop  The LATE Scheduler(Longest Approximate Time to End)

Conclusion

Identify the problem in Hadoop’s scheduler

Compare two schedulers for improving the performance of MapReduce in heterogeneous environment

How to improve the performance of SAMR

Page 37: Table of ContentsTable of Contents  Overview  Scheduling in Hadoop  Heterogeneity in Hadoop  The LATE Scheduler(Longest Approximate Time to End)

Thanks