handing data skew in mapreduce · 2013. 9. 12. · technische universitat munc¨ hen¨ summary:...
TRANSCRIPT
Technische Universitat Munchen
Handing Data Skew in MapReduce
B. Gufler♠, N. Augsten♣, A. Reiser♠, A. Kemper♠
♠Technische Universitat Munchen ♣Free University of Bolzano-Bozen
May 8, 2011
Technische Universitat Munchen
Cloud Data Processing with MapReduce
MapReduce for BusinessI usage statistics, index building, . . .I simple reducer tasks
MapReduce for eScience: new challenges
1. heavy data skew
2. complex reducer tasks
; How to deal with these problems?
1
Technische Universitat Munchen
Cloud Data Processing with MapReduce
MapReduce for BusinessI usage statistics, index building, . . .I simple reducer tasks
MapReduce for eScience: new challenges
1. heavy data skew
2. complex reducer tasks
; How to deal with these problems?
1
Technische Universitat Munchen
Cloud Data Processing with MapReduce
MapReduce for BusinessI usage statistics, index building, . . .I simple reducer tasks
MapReduce for eScience: new challenges
1. heavy data skew
2. complex reducer tasks
; How to deal with these problems?
1
Technische Universitat Munchen
Cloud Data Processing with MapReduce
MapReduce for BusinessI usage statistics, index building, . . .I simple reducer tasks
MapReduce for eScience: new challenges
1. heavy data skew
2. complex reducer tasks
; How to deal with these problems?
1
Technische Universitat Munchen
Data Skew
I values of an attribute not evenly distributedI very common in scientific dataI causes load imbalance if attribute is used for partitioning
2
Technische Universitat Munchen
Example: Node Masses in Millennium Simulation
0
100
200
300
400nodes (∗1M)
node mass1–
50
434M
51–5
00
295M
501–
5k
29M
5k–5
0k
2.5M
50k–
500k
143k
500k
–5M
1.8k
3
Technische Universitat Munchen
Complex Reducer Tasks
I complex scientific analysis algorithmsI often polynomial or even exponential complexity
I amplifies load imbalance
Map
Map
Map
Reduce
Reduce
6 · 22 = 24 42 + 82 = 80
4
Technische Universitat Munchen
Complex Reducer Tasks
I complex scientific analysis algorithmsI often polynomial or even exponential complexity
I amplifies load imbalance
Map
Map
Map
Reduce
Reduce
6 · 22 = 24
42 + 82 = 80
4
Technische Universitat Munchen
Complex Reducer Tasks
I complex scientific analysis algorithmsI often polynomial or even exponential complexity
I amplifies load imbalance
Map
Map
Map
Reduce
Reduce
6 · 22 = 24 42 + 82 = 80
4
Technische Universitat Munchen
Example: Frequent Subtree Mining
I find common substructures in(large parts of) a forest
I exponential complexity forunordered subtrees
aT1
b c
d a
dT2
a
c
a
bT3
a
d
c
d
5
Technische Universitat Munchen
Example: Frequent Subtree Mining
I find common substructures in(large parts of) a forest
I exponential complexity forunordered subtrees
aT1
b c
d a
dT2
a
c
a
bT3
a
d
c
d
5
Technische Universitat Munchen
Example: Frequent Subtree Mining
I find common substructures in(large parts of) a forest
I exponential complexity forunordered subtrees
aT1
b c
d a
dT2
a
c
a
bT3
a
d
c
d
5
Technische Universitat Munchen
Example: Frequent Subtree Mining
I find common substructures in(large parts of) a forest
I exponential complexity forunordered subtrees
aT1
b c
d a
dT2
a
c
a
bT3
a
d
c
d
5
Technische Universitat Munchen
Summary: MapReduce for eScience
ChallengesI load imbalance due to data skewI complex analysis amplifies execution time differencesI example: frequent subtree mining (PathJoin algorithm)
I 16 GB data set (subset of the Millennium simulation)I cluster of 16 nodesI total execution time: 8 hoursI execution time difference of reducers: up to 6 hours
6
Technische Universitat Munchen
Agenda
I MotivationI Load Balancing in MapReduce: Big PictureI Estimation of Partition CostI Assignment of Partitions to ReducersI Number of PartitionsI Evaluation
7
Technische Universitat Munchen
Data Partitioning in MapReduce
Map
cluster: all itemswith the samekey
partition: all keyswith the samehash values
hash function fordata partitioning
8
Technische Universitat Munchen
Load Balancing: Current MapReduceP
0P
1
P0
P1
P0
P1
Map
Red
uce
P0 P1 cont
rolle
r
I hash partitioning on each mapperI static 1:1 assignment of partitions to
reducersI balances number of keys per
reducer instead of workloadI no flexibility
9
Technische Universitat Munchen
Improved Load Balancing
I create more partitions than there are reducersI cost-based assignment of partitions to reducers
P0
P1
P2
P0
P1
P2
P0
P1
P2
Map
Red
uce
P0+P2 P1 cont
rolle
r
I flexibility for load balancingI challenges
1. compute number ofpartitions
2. estimate partition cost3. assign partitions to
reducers
10
Technische Universitat Munchen
Estimate Partition Cost
partition cost depends on1. number and size of the clusters within the partition
I locally monitored on each mapperI centrally aggregated on one controller
2. complexity of the reducer algorithmI specified by the user
; estimate cluster cost based on this information
; sum up cluster costs to obtain partition cost
11
Technische Universitat Munchen
Estimate Partition Cost: Monitoring
I tuple countI monitor local count on every mapperI sum up on controller
I cluster countI create Bloom filter on every mapperI employ Linear Counting on controller
12
Technische Universitat Munchen
Estimate Partition Cost: MonitoringM
appe
r1
tuples: 17
clusters: [10011]
P0
Map
per2
tuples: 31
clusters: [01011]
P0
13
Technische Universitat Munchen
Estimate Partition Cost: MonitoringM
appe
r1
tuples: 17
clusters: [10011]
P0
Map
per2
tuples: 31
clusters: [01011]
P0
Agg
rega
te
13
Technische Universitat Munchen
Estimate Partition Cost: MonitoringM
appe
r1
tuples: 17
clusters: [10011]
P0
Map
per2
tuples: 31
clusters: [01011]
P0
Agg
rega
te
∑
13
Technische Universitat Munchen
Estimate Partition Cost: MonitoringM
appe
r1
tuples: 17
clusters: [10011]
P0
Map
per2
tuples: 31
clusters: [01011]
P0
Agg
rega
tetuples: 48∑
13
Technische Universitat Munchen
Estimate Partition Cost: MonitoringM
appe
r1
tuples: 17
clusters: [10011]
P0
Map
per2
tuples: 31
clusters: [01011]
P0
Agg
rega
tetuples: 48∑|
13
Technische Universitat Munchen
Estimate Partition Cost: MonitoringM
appe
r1
tuples: 17
clusters: [10011]
P0
Map
per2
tuples: 31
clusters: [01011]
P0
Agg
rega
tetuples: 48∑| lc
K. Whang, B. Vander-Zanden, H. Taylor: A Linear-Time Probabilistic Counting Algorithm for Databa-se Applications, ACM TODS 15(2), 1990
13
Technische Universitat Munchen
Estimate Partition Cost: MonitoringM
appe
r1
tuples: 17
clusters: [10011]
P0
Map
per2
tuples: 31
clusters: [01011]
P0
Agg
rega
tetuples: 48
clusters: 8
∑| lc
K. Whang, B. Vander-Zanden, H. Taylor: A Linear-Time Probabilistic Counting Algorithm for Databa-se Applications, ACM TODS 15(2), 1990
13
Technische Universitat Munchen
Estimate Partition Cost: Calculation
Agg
rega
te tuples: 48
clusters: 8
reducer complexity: quadratic(e. g., pairwise comparison)
I average cluster size: 488 = 6
I cluster cost: 62 = 36I partition cost: 8 · 36 = 288
14
Technische Universitat Munchen
Estimate Partition Cost: Calculation
Agg
rega
te tuples: 48
clusters: 8
reducer complexity: quadratic(e. g., pairwise comparison)
I average cluster size: 488 = 6
I cluster cost: 62 = 36I partition cost: 8 · 36 = 288
14
Technische Universitat Munchen
Estimate Partition Cost: Calculation
Agg
rega
te tuples: 48
clusters: 8
reducer complexity: quadratic(e. g., pairwise comparison)
I average cluster size: 488 = 6
I cluster cost: 62 = 36
I partition cost: 8 · 36 = 288
14
Technische Universitat Munchen
Estimate Partition Cost: Calculation
Agg
rega
te tuples: 48
clusters: 8
reducer complexity: quadratic(e. g., pairwise comparison)
I average cluster size: 488 = 6
I cluster cost: 62 = 36I partition cost: 8 · 36 = 288
14
Technische Universitat Munchen
Assign Partitions to Reducers
I assign partitions to reducers balancing the partition costI bin packing problem, but: NP-hardI rely on a heuristic instead
50 40 20 15 3
15
Technische Universitat Munchen
Assign Partitions to Reducers
I assign partitions to reducers balancing the partition costI bin packing problem, but: NP-hardI rely on a heuristic instead
50 40 20 15 3
reducer 1 reducer 20 0
15
Technische Universitat Munchen
Assign Partitions to Reducers
I assign partitions to reducers balancing the partition costI bin packing problem, but: NP-hardI rely on a heuristic instead
50 40 20 15 3
reducer 1 reducer 2050
15
Technische Universitat Munchen
Assign Partitions to Reducers
I assign partitions to reducers balancing the partition costI bin packing problem, but: NP-hardI rely on a heuristic instead
50 40 20 15 3
reducer 1 reducer 250 40
15
Technische Universitat Munchen
Assign Partitions to Reducers
I assign partitions to reducers balancing the partition costI bin packing problem, but: NP-hardI rely on a heuristic instead
50 40 20 15 3
reducer 1 reducer 250 60
15
Technische Universitat Munchen
Assign Partitions to Reducers
I assign partitions to reducers balancing the partition costI bin packing problem, but: NP-hardI rely on a heuristic instead
50 40 20 15 3
reducer 1 reducer 26065
15
Technische Universitat Munchen
Assign Partitions to Reducers
I assign partitions to reducers balancing the partition costI bin packing problem, but: NP-hardI rely on a heuristic instead
50 40 20 15 3
reducer 1 reducer 265 63
15
Technische Universitat Munchen
How many Partitions?
I static approachI number of partitions fixed beforehand by controllerI matching partitions created on every mapper
I dynamic approachI only initial number of partitions fixedI mappers may split some partitions further
16
Technische Universitat Munchen
How many Partitions?
I static approachI number of partitions fixed beforehand by controllerI matching partitions created on every mapper
I dynamic approachI only initial number of partitions fixedI mappers may split some partitions further
16
Technische Universitat Munchen
Evaluation
I load balancing qualityI influence on execution time
Synthetic Data SetsI 520M tuplesI 2 000 clustersI Zipf distributions
Millennium Data SetI 760M tuplesI ≈ 32 000 clusters
17
Technische Universitat Munchen
Evaluation: Load Balancing
I measure standard deviation of partition cost per reducerI small is good: small differences between reducers
01020304050
10 20 25 50
stdev [% of mean]
reducers
Synthetic Data, Moderate Skew
01020304050
10 20 25 50
stdev [% of mean]
reducers
Millennium Data
Standard MapReduce Fine Partitioning
18
Technische Universitat Munchen
Evaluation: Execution Time
I measure execution time per reducerI max time: overall time of reduce job
05
101520
10 20 25 50
Synthetic Execution Time
reducers
Synthetic Data, Moderate Skew
010203040
10 20 25 50
Synthetic Execution Time
reducers
Millennium Data
Standard MapReduce Fine PartitioningTime for Most Expensive Cluster
19
Technische Universitat Munchen
Summary
I scientific data analysis with MapReduceI partition-based load balancingI distributed monitoring for cost estimationI experimental evaluation
I better load balancingI reduced overall execution time ∑
| lc
20
Technische Universitat Munchen
Thank you!
21