greenhadoop : leveraging green energy in data-processing frameworks

26
GreenHadoop: Leveraging Green Energy in Data-Processing Frameworks Íñigo Goiri, Kien Le, Thu D. Nguyen, Jordi Guitart, Jordi Torres, and Ricardo Bianchini

Upload: celeste-albert

Post on 03-Jan-2016

22 views

Category:

Documents


1 download

DESCRIPTION

GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks. Íñigo Goiri , Kien Le, Thu D. Nguyen, Jordi Guitart , Jordi Torres, and Ricardo Bianchini. Motivation. Datacenters consume large amounts of energy Energy cost is not the only problem Brown sources: coal, natural gas… - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks

GreenHadoop: Leveraging Green Energy in Data-Processing Frameworks

Íñigo Goiri, Kien Le, Thu D. Nguyen,Jordi Guitart, Jordi Torres, and Ricardo Bianchini

Page 2: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks

2

Motivation• Datacenters consume large amounts of energy• Energy cost is not the only problem– Brown sources: coal, natural gas…

• Connect datacenters to green sources– Solar panels, wind turbines…– Green datacenter– Early examples in the field

Page 3: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks

3

Green datacenter• Energy sources

– Solar/wind: variable over time– Electrical grid: backup

• Mitigation approaches are not ideal– Batteries and net metering

• We need to match the energy demand to the supply

Power

Time

Load

Solar power

Workload

Page 4: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks

4

J3

J3

Delaying load within time bounds

J1 J2Nod

esPow

er

Time

Nod

esPow

er

Delay some jobs is OK (respecting time bounds)

J2

J2J1

Page 5: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks

5

Scheduling data-processing workloadsin green datacenters

• Data-processing jobs– Each task operates on a chunk of data– Data distributed among servers

• Simple workflow: MapReduce– Map tasks: process input data– Reduce tasks: merge maps’ outputs

Challenges• Match MapReduce workload with green energy availability

– No information on #nodes, length, power…• Conserve energy while ensuring data availability

Map1

Map2

Map3

Map4

Map5

Reduce

Reduce 6

7

Shuffle

Page 6: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks

6

Overview of GreenHadoop

• Predict solar energy availability• May delay jobs but must meet time bounds

– Maximize green energy use– If not enough green energy, minimize brown electricity cost– Brown energy cost + peak brown power cost

• Deactivate idle servers while keeping data available

• Divided into two parts1. Computation scheduling2. Data management

Page 7: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks

7

1. Computation scheduling

Job3Job1

Job4

Job5

Job6

Job2

Estimate the energy required by jobs (EWMA)

Job3Job1

Job4

Job5

Job6

Job2

Page 8: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks

8

1. Computation scheduling

Job3Job1

Job4

Job5

Job6

Job2

Power

TimeNow

Assign green energy first

Predict energy availability(weather forecast)

On-peakOff-peak Off-peak

Page 9: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks

9

1. Computation scheduling

Job3Job1

Job4

Job5

Job6

Job2

TimeNow

Assign cheap brown energy

Power

Previouspeak

On-peakOff-peak Off-peak

Page 10: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks

10

1. Computation scheduling

Job3Job1

Job4

Job5

Job6

Job2

TimeNow

Assign expensive energy

Power

Activeservers

On-peakOff-peak Off-peak

Current power → Active servers

Page 11: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks

11

1. Computation scheduling

TimeNow

Activeservers

Power

As time goes by…

the number of active servers changes

Page 12: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks

12

2. Data management• Deactivate servers to save energy

– Some data might become unavailable• Prior solution: covering subset [Leverich’09]

– Set of servers always running has ALL data

Covering subset

7

3

45

21 6

8

7 1

4 5

6

3

2

8 1

7 3

• Our approach• Only required data has to be available• We usually require fewer active servers

Page 13: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks

13

2. Data managementServer 1

1 72Active

Decommission

Down

Server 24

356

Server 3

46

Required fileNon-required file

Server 42

3 84

Server 5

3 67

JobA 4

JobB 5

JobC 1

6

Running queue:

Page 14: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks

14

2. Data management

Server 42

3 84

Server 5

3 67

Active

Decommission

Down

GreenHadoop (computation) requires only 2 servers

Server 1

1 72

Server 1

1 72

Server 24

356

Server 3

46

Required fileNon-required file JobA 4

JobB 5

JobC 1

6

Running queue:

Page 15: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks

15

2. Data management

Active

Decommission

Down

Move required files to Active servers

Server 1

1 72

Server 24

356

Server 3

46

1

Server 42

3 84

Server 5

3 67

Replicate

JobA 4

JobB 5

JobC 1

6

Running queue:

Page 16: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks

16

Server 1

1 72

2. Data management

Active

Decommission

Down

Decommissioned server can be sent to Down

Server 1

1 72

Server 24

356

Server 3

46

Required fileNon-required file

1

Server 42

3 84

Server 5

3 67

JobA 4

JobB 5

JobC 1

6

Running queue:

Page 17: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks

17

Server 1

1 72

2. Data management

Active

Decommission

Down

Jobs to be executed change → Required files change

Server 24

356

Server 3

46

Non-required file

1

Server 42

3 84

Server 5

3 67

JobA 4

JobB 5

JobC 1

6

JobD 8

Required file

646

4

648

Required file

Running queue:

Page 18: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks

18

Server 42

3 84

Server 1

1 72

2. Data management

Active

Decommission

Down

Make missing data available

Server 24

356

Server 42

3 84

Server 5

3 67

Server 3

46

1

Required file

Non-required file

JobB 5

JobC 1

JobD 8

Required fileRunning queue:

Page 19: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks

19

Server 42

3 84

Server 1

1 72

2. Data management

Active

Decommission

Down

Server 24

356

Server 42

3 84

Server 5

3 67

GreenHadoop (computation) requires 3 servers

Server 3

46

1

Non-required file

JobB 5

JobC 1

JobD 8

Required fileRunning queue:

Page 20: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks

20

Evaluation methodology

• Cluster with 16 Xeon servers– Hadoop and Hadoop turning off idle servers (EAHadoop)– GreenHadoop: green energy, brown electricity cost

• Energy profile– NJ electricity pricing (on/off peak and peak cost)– Solar farm energy availability (14 PV panels)– Five pairs of days (combinations of high and low days)

• Workload– Derived from Facebook [Zaharia’09]– Jobs with up to 37GB, 600 tasks, and 6 hours of length– Internal time bound of one day

Page 21: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks

21

Energy prediction vs actual

6:00 AM

7:00 AM

8:00 AM

9:00 AM

10:00 AM

11:00 AM

12:00 PM

1:00 PM

2:00 PM

3:00 PM

4:00 PM

5:00 PM

6:00 PM

7:00 PM

0.0

0.5

1.0

1.5

2.0PredictionActual

Ener

gy (k

Wh)

0 6 12 18 24 30 36 42 480

10

20

30

40

Hours ahead

Erro

r (%

)

rain thunderstormcloud cover

Page 22: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks

22

30 kWh59 kWh

$8.00

39 kWh25 kWh

$6.06 -24%

31% more green 39% cost savings

GreenHadoop for Facebook & high-high days

Greenconsumed

Brownconsumed

Brownprice

Greenpredicted

Greenproduced

Page 23: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks

23

Green energy increase Cost savings05

10152025303540

High-High High-Low Low-HighLow-Low Very Low

%

Green energy increase Cost savings05

10152025303540

EAHadoopGreenGreen & Brown EnergyGreen & Brown Energy & Brown Peak

%

Different pairs of days Effect of parameters inGreenHadoop

GreenHadoop for Facebook

Page 24: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks

24

Other results

• Workload intensity (datacenter utilization)• High-priority jobs• Shorter time bounds• Data availability• Workloads variations

• Consistent green energy increases and cost savings

Page 25: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks

25

Conclusions• Data-processing scheduler for green datacenters• Predicts green energy availability• Increases the use of green energy• Reduces brown electricity costs• Manages data availability

• We are building Parasol– Solar-powered μdatacenter– Poster session

Page 26: GreenHadoop : Leveraging Green Energy in Data-Processing Frameworks

GreenHadoop: Leveraging Green Energy in Data-Processing Frameworks

Íñigo Goiri, Kien Le, Thu D. Nguyen,Jordi Guitart, Jordi Torres, and Ricardo Bianchini