on the energy (in)efficiency of hadoop: scale-down...
TRANSCRIPT
![Page 1: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing](https://reader034.vdocuments.mx/reader034/viewer/2022051916/6007f6b77cb3895be010e458/html5/thumbnails/1.jpg)
Jacob Leverich
and Christos Kozyrakis
Stanford University
On the Energy (In)efficiency of Hadoop:
Scale-down Efficiency
![Page 2: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing](https://reader034.vdocuments.mx/reader034/viewer/2022051916/6007f6b77cb3895be010e458/html5/thumbnails/2.jpg)
The current design of Hadoop precludes
scale-down of commodity clusters.
2
![Page 3: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing](https://reader034.vdocuments.mx/reader034/viewer/2022051916/6007f6b77cb3895be010e458/html5/thumbnails/3.jpg)
Outline
3
Hadoop crash-course
Scale-down efficiency
How Hadoop precludes scale-down
How to fix it
Did we fix it?
Future work
![Page 4: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing](https://reader034.vdocuments.mx/reader034/viewer/2022051916/6007f6b77cb3895be010e458/html5/thumbnails/4.jpg)
Hadoop crash-course
4
Hadoop == Distributed Processing Framework
1000s of nodes, PBs of data
Hadoop MapReduce Google MapReduce
Tasks are automatically distributed by the framework.
Hadoop Distributed File System Google File System
Files divided into large (64MB) blocks; amortizes overheads.
Blocks replicated for availability and durability management.
![Page 5: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing](https://reader034.vdocuments.mx/reader034/viewer/2022051916/6007f6b77cb3895be010e458/html5/thumbnails/5.jpg)
Scale-down motivation
5
HP Proliant DL140 G3
Typical utilization
[Barroso and Holzle, 2007]
![Page 6: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing](https://reader034.vdocuments.mx/reader034/viewer/2022051916/6007f6b77cb3895be010e458/html5/thumbnails/6.jpg)
Scale-down for energy proportionality
6
40% 40% 40% 40%
= 4 x P(40%) = 4 x 325W = 1300W
80% 80% 0% 0%
= 2 x P(80%) = 2 x 365W = 730W
![Page 7: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing](https://reader034.vdocuments.mx/reader034/viewer/2022051916/6007f6b77cb3895be010e458/html5/thumbnails/7.jpg)
The problem: storage consolidation
7
Hadoop Distributed File System… Consolidate computation? Easy.
Consolidate storage? Not (as) easy.
“All servers must be available, even during low-load periods.” [Barroso and Holzle, 2007]
Hadoop inherited this “feature” from Google File System
:-(
![Page 8: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing](https://reader034.vdocuments.mx/reader034/viewer/2022051916/6007f6b77cb3895be010e458/html5/thumbnails/8.jpg)
HDFS and block replication
8
“block replication table” = replica
1st replica local, others remote.
Allocate evenly.
Replication factor = 3
1 2 3 4 5 6 7 8 9
Node
Blo
ck
A
B
C
D
E
F
G
H
![Page 9: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing](https://reader034.vdocuments.mx/reader034/viewer/2022051916/6007f6b77cb3895be010e458/html5/thumbnails/9.jpg)
Attempted scale-down
9
1 2 3 4 5 6 7 8 9
Node
Blo
ck
A
B
C
D
E
F
G
H
Problems:
Scale-down vs. Self-healing Wasted capacity: sleeping replicas != lost replicas
Flurry of net & disk activity!
Which nodes to disable?
Must maintain data availability
![Page 10: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing](https://reader034.vdocuments.mx/reader034/viewer/2022051916/6007f6b77cb3895be010e458/html5/thumbnails/10.jpg)
How to fix it
10
Scale-down vs. Self-healing Wasted capacity: sleeping replicas != lost replicas
Flurry of net & disk activity!
Which nodes to disable?
Must maintain data availability
Self-non-healing
![Page 11: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing](https://reader034.vdocuments.mx/reader034/viewer/2022051916/6007f6b77cb3895be010e458/html5/thumbnails/11.jpg)
Self-non-healing
11
1 2 3 4 5 6 7 8 9
Node
Blo
ck
A
B
C
D
E
F
G
H
Zzzzz…
Coordinate with Hadoop when we put a node to sleep
Prevent block re-replications
![Page 12: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing](https://reader034.vdocuments.mx/reader034/viewer/2022051916/6007f6b77cb3895be010e458/html5/thumbnails/12.jpg)
New RPCs in HDFS primary node
12
sleepNode(String hostname)
Similar to node decommissioning, but don’t replicate blocks
% hadoop dfsadmin –sleepNode 10.10.1.80:50020
Save blocks to a “sleeping blocks” map for bookkeeping
Ignore heartbeats and block reports from this node
wakeNode(String hostname)
Watch for heartbeats, force node to send block report
Execute arbitrary commands (i.e. send wake-on-LAN packet)
wakeBlock(Block target)
Wake a sleeping node that has a particular block
![Page 13: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing](https://reader034.vdocuments.mx/reader034/viewer/2022051916/6007f6b77cb3895be010e458/html5/thumbnails/13.jpg)
How to fix it
13
Scale-down vs. Self-healing Wasted capacity: sleeping replicas != lost replicas
Flurry of net & disk activity!
Which nodes to disable?
Must maintain data availability
Self-non-healing
“Covering Subset” replication invariant
![Page 14: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing](https://reader034.vdocuments.mx/reader034/viewer/2022051916/6007f6b77cb3895be010e458/html5/thumbnails/14.jpg)
Replication placement invariants
14
Hadoop uses simple invariants to direct block placement
Example: Rack-Aware Block Placement
Protects against common-mode failures (i.e. switch failure, power delivery failure)
Invariant: Blocks must have replicas on at least 2 racks.
Is there some energy-efficient replication invariant?
Must inform our decision on which nodes we can disable.
![Page 15: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing](https://reader034.vdocuments.mx/reader034/viewer/2022051916/6007f6b77cb3895be010e458/html5/thumbnails/15.jpg)
Covering subset replication invariant
15
Goal:
Maximize the number of servers that can simultaneously sleep.
Strategy:
Aggregate live data onto a “covering subset” of nodes.
Never turn off a node in the covering subset.
Invariant:
Every block must have one replica in the covering subset.
![Page 16: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing](https://reader034.vdocuments.mx/reader034/viewer/2022051916/6007f6b77cb3895be010e458/html5/thumbnails/16.jpg)
Covering subset replication invariant
16
1 2 3 4 5 6 7 8 9
Node
Blo
ck
A
B
C
D
E
F
G
H
Zzzzz…
![Page 17: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing](https://reader034.vdocuments.mx/reader034/viewer/2022051916/6007f6b77cb3895be010e458/html5/thumbnails/17.jpg)
How to fix it
17
Scale-down vs. Self-healing Wasted capacity: sleeping replicas != lost replicas
Flurry of net & disk activity!
Which nodes to disable?
Must maintain data availability
Self-non-healing
“Covering Subset” replication invariant
![Page 18: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing](https://reader034.vdocuments.mx/reader034/viewer/2022051916/6007f6b77cb3895be010e458/html5/thumbnails/18.jpg)
Evaluation
18
![Page 19: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing](https://reader034.vdocuments.mx/reader034/viewer/2022051916/6007f6b77cb3895be010e458/html5/thumbnails/19.jpg)
Methodology
19
Disable n nodes, compare Hadoop job energy & perf. Individual runs of webdata_sort/webdata_scan from GridMix
30 minute job batches (with some idle time!)
Cluster 36 nodes, HP Proliant DL140 G3
2 quad-core Xeon 5335s each, 32GB RAM, 500GB disk
9-node covering subset (1/4 of the cluster)
Energy model Validated estimate based on CPU utilization
Disabled node = 0 Watts
Possible to evaluate hypothetical hardware
![Page 20: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing](https://reader034.vdocuments.mx/reader034/viewer/2022051916/6007f6b77cb3895be010e458/html5/thumbnails/20.jpg)
Results: Performance
20
It slows down (obviously)
Peak performance benchmark
Sort (network intensive) worse off than Scan
Amdahl’s Law
![Page 21: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing](https://reader034.vdocuments.mx/reader034/viewer/2022051916/6007f6b77cb3895be010e458/html5/thumbnails/21.jpg)
Results: Energy
21
Less energy consumed for same amount of work
9% to 51% saved
Nodes consume energy more than they improve performance
Slower systems usually more efficient; high performance is a trade-off!
![Page 22: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing](https://reader034.vdocuments.mx/reader034/viewer/2022051916/6007f6b77cb3895be010e458/html5/thumbnails/22.jpg)
Results: Power
22
Excellent knob for cluster-level power capping
Much larger dynamic range than tweaking frequency/voltage at the server level
![Page 23: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing](https://reader034.vdocuments.mx/reader034/viewer/2022051916/6007f6b77cb3895be010e458/html5/thumbnails/23.jpg)
Results: The Bottom Line
23
Operational Hadoop clusters can scale-down.
We reduce energy consumption
at the expense of single-job latency.
![Page 24: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing](https://reader034.vdocuments.mx/reader034/viewer/2022051916/6007f6b77cb3895be010e458/html5/thumbnails/24.jpg)
Continuing Work
24
![Page 25: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing](https://reader034.vdocuments.mx/reader034/viewer/2022051916/6007f6b77cb3895be010e458/html5/thumbnails/25.jpg)
Covering subset: mechanism vs. policy
25
The replication invariant is a mechanism.
Which nodes constitute a subset is policy (open question).
Size trade-off
Too small: Low capacity and performance bottleneck
Too large: Wasted energy on idle nodes
1 / (replication factor) reasonable starting point
How many covering subsets?
Invariant: Blocks must have a replica in each covering subsets.
![Page 26: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing](https://reader034.vdocuments.mx/reader034/viewer/2022051916/6007f6b77cb3895be010e458/html5/thumbnails/26.jpg)
Quantify Trade-offs
26
Random Fault Injection experiments
What happens when a covering subset node fails?
How much do you trust idle disks?
Performance
Availability
Energy consumption
Durability
![Page 27: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing](https://reader034.vdocuments.mx/reader034/viewer/2022051916/6007f6b77cb3895be010e458/html5/thumbnails/27.jpg)
Dynamic Power Management
27
Algorithmically decide which nodes to sleep or wakeup
What signals to use?
CPU utilization?
Disk/net utilization?
Job Queue length?
MapReduce and HDFS must cooperate
i.e. idle nodes may host transient Map outputs
![Page 28: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing](https://reader034.vdocuments.mx/reader034/viewer/2022051916/6007f6b77cb3895be010e458/html5/thumbnails/28.jpg)
Workloads
28
Benchmarks
HBase/BigTable vs. MapReduce
Short, unpredictable data access vs. long streaming access
Quality of service and throughput are important
Pig vs. Sort+Scan
Recorded job traces vs. random job traces
Peak performance vs. fractional utilization
What are typical usage patterns?
![Page 29: On the Energy (In)efficiency of Hadoop: Scale-down Efficiencycsl.stanford.edu/~christos/publications/2009.hadoopenergy.hotpowe… · Hadoop crash-course 4 Hadoop == Distributed Processing](https://reader034.vdocuments.mx/reader034/viewer/2022051916/6007f6b77cb3895be010e458/html5/thumbnails/29.jpg)
Scale
29
36-nodes to 1000-nodes; emergent behaviors?
Network hierarchy
Hadoop framework inefficiencies
Computational overhead (must process many block reports!)
Experiments on Amazon EC2
Awarded an Amazon Web Services grant
Can’t measure power! Must use a model.
Any Amazonians here? Let’s make a validated energy model.