handing data skew in mapreduce · 2013. 9. 12. · technische universitat munc¨ hen¨ summary:...

Technische Universitat Munchen

Handing Data Skew in MapReduce

B. Gufler♠, N. Augsten♣, A. Reiser♠, A. Kemper♠

♠Technische Universitat Munchen ♣Free University of Bolzano-Bozen

May 8, 2011


Cloud Data Processing with MapReduce

MapReduce for BusinessI usage statistics, index building, . . .I simple reducer tasks

MapReduce for eScience: new challenges

1. heavy data skew

2. complex reducer tasks

; How to deal with these problems?

1


Data Skew

I values of an attribute not evenly distributedI very common in scientific dataI causes load imbalance if attribute is used for partitioning

2


Example: Node Masses in Millennium Simulation

0

100

200

300

400nodes (∗1M)

node mass1–

50

434M

51–5

00

295M

501–

5k

29M

5k–5

0k

2.5M

50k–

500k

143k

500k

–5M

1.8k

3


Complex Reducer Tasks

I complex scientific analysis algorithmsI often polynomial or even exponential complexity

I amplifies load imbalance

Map

Map

Map

Reduce

Reduce

6 · 22 = 24 42 + 82 = 80

4





Map

Map

Map

Reduce

Reduce

6 · 22 = 24

42 + 82 = 80

4





Map

Map

Map

Reduce

Reduce

6 · 22 = 24 42 + 82 = 80

4


Example: Frequent Subtree Mining

I find common substructures in(large parts of) a forest

I exponential complexity forunordered subtrees

aT1

b c

d a

dT2

a

c

a

bT3

a

d

c

d

5


Summary: MapReduce for eScience

ChallengesI load imbalance due to data skewI complex analysis amplifies execution time differencesI example: frequent subtree mining (PathJoin algorithm)

I 16 GB data set (subset of the Millennium simulation)I cluster of 16 nodesI total execution time: 8 hoursI execution time difference of reducers: up to 6 hours

6


Agenda

I MotivationI Load Balancing in MapReduce: Big PictureI Estimation of Partition CostI Assignment of Partitions to ReducersI Number of PartitionsI Evaluation

7


Data Partitioning in MapReduce

Map

cluster: all itemswith the samekey

partition: all keyswith the samehash values

hash function fordata partitioning

8


Load Balancing: Current MapReduceP

0P

1

P0

P1

P0

P1

Map

Red

uce

P0 P1 cont

rolle

r

I hash partitioning on each mapperI static 1:1 assignment of partitions to

reducersI balances number of keys per

reducer instead of workloadI no flexibility

9


Improved Load Balancing

I create more partitions than there are reducersI cost-based assignment of partitions to reducers

P0

P1

P2

P0

P1

P2

P0

P1

P2

Map

Red

uce

P0+P2 P1 cont

rolle

r

I flexibility for load balancingI challenges

1. compute number ofpartitions

2. estimate partition cost3. assign partitions to

reducers

10


Estimate Partition Cost

partition cost depends on1. number and size of the clusters within the partition

I locally monitored on each mapperI centrally aggregated on one controller

2. complexity of the reducer algorithmI specified by the user

; estimate cluster cost based on this information

; sum up cluster costs to obtain partition cost

11


Estimate Partition Cost: Monitoring

I tuple countI monitor local count on every mapperI sum up on controller

I cluster countI create Bloom filter on every mapperI employ Linear Counting on controller

12


Estimate Partition Cost: MonitoringM

appe

r1

tuples: 17

clusters: [10011]

P0

Map

per2

tuples: 31

clusters: [01011]

P0

13



appe

r1

tuples: 17

clusters: [10011]

P0

Map

per2

tuples: 31

clusters: [01011]

P0

Agg

rega

te

13



appe

r1

tuples: 17

clusters: [10011]

P0

Map

per2

tuples: 31

clusters: [01011]

P0

Agg

rega

te

∑

13



appe

r1

tuples: 17

clusters: [10011]

P0

Map

per2

tuples: 31

clusters: [01011]

P0

Agg

rega

tetuples: 48∑

13



appe

r1

tuples: 17

clusters: [10011]

P0

Map

per2

tuples: 31

clusters: [01011]

P0

Agg

rega

tetuples: 48∑|

13



appe

r1

tuples: 17

clusters: [10011]

P0

Map

per2

tuples: 31

clusters: [01011]

P0

Agg

rega

tetuples: 48∑| lc

K. Whang, B. Vander-Zanden, H. Taylor: A Linear-Time Probabilistic Counting Algorithm for Databa-se Applications, ACM TODS 15(2), 1990

13



appe

r1

tuples: 17

clusters: [10011]

P0

Map

per2

tuples: 31

clusters: [01011]

P0

Agg

rega

tetuples: 48

clusters: 8

∑| lc

K. Whang, B. Vander-Zanden, H. Taylor: A Linear-Time Probabilistic Counting Algorithm for Databa-se Applications, ACM TODS 15(2), 1990

13


Estimate Partition Cost: Calculation

Agg

rega

te tuples: 48

clusters: 8

reducer complexity: quadratic(e. g., pairwise comparison)

I average cluster size: 488 = 6

I cluster cost: 62 = 36I partition cost: 8 · 36 = 288

14



Agg

rega

te tuples: 48

clusters: 8



I cluster cost: 62 = 36

I partition cost: 8 · 36 = 288

14



Agg

rega

te tuples: 48

clusters: 8



I cluster cost: 62 = 36I partition cost: 8 · 36 = 288

14


Assign Partitions to Reducers

I assign partitions to reducers balancing the partition costI bin packing problem, but: NP-hardI rely on a heuristic instead

50 40 20 15 3

15




50 40 20 15 3

reducer 1 reducer 20 0

15




50 40 20 15 3

reducer 1 reducer 2050

15




50 40 20 15 3


15




50 40 20 15 3

reducer 1 reducer 26065

15




50 40 20 15 3


15


How many Partitions?

I static approachI number of partitions fixed beforehand by controllerI matching partitions created on every mapper

I dynamic approachI only initial number of partitions fixedI mappers may split some partitions further

16


Evaluation

I load balancing qualityI influence on execution time

Synthetic Data SetsI 520M tuplesI 2 000 clustersI Zipf distributions

Millennium Data SetI 760M tuplesI ≈ 32 000 clusters

17


Evaluation: Load Balancing

I measure standard deviation of partition cost per reducerI small is good: small differences between reducers

01020304050

10 20 25 50

stdev [% of mean]

reducers

Synthetic Data, Moderate Skew

01020304050

10 20 25 50

stdev [% of mean]

reducers

Millennium Data

Standard MapReduce Fine Partitioning

18


Evaluation: Execution Time

I measure execution time per reducerI max time: overall time of reduce job

05

101520

10 20 25 50

Synthetic Execution Time

reducers

Synthetic Data, Moderate Skew

010203040

10 20 25 50

Synthetic Execution Time

reducers

Millennium Data

Standard MapReduce Fine PartitioningTime for Most Expensive Cluster

19


Summary

I scientific data analysis with MapReduceI partition-based load balancingI distributed monitoring for cost estimationI experimental evaluation

I better load balancingI reduced overall execution time ∑

| lc

20


Thank you!

21

handing data skew in mapreduce · 2013. 9. 12. · technische universitat munc¨ hen¨ summary:...

Documents