power and energy conservation techniques for disk array based systems zvika guz...

Power and Energy Conservation Techniques for

Disk Array Based Systems

Zvika GuzZvika Guz

[email protected]@tx.technion.ac.il

June, 2004June, 2004

2

Agenda Motivation

The Severs world special characteristics Why the mobile world approach doesn’t work

3 different solutions Dynamically modulate the disk speed

Power Aware Cache

Popular Data concentration

Summary

3

Agenda Motivation



Power Aware Cache


Summary

4

Power consumption in Servers ? Who cares ?

Energy cost is growing by 25% annually. Power requirement will grow from 150Watts/ft2 to 300Watts/ft2.

Requires huge cooling infrastructure in order to dissipate this hit two times the total power used by the computers is needed for HVAC

Total web sites and servers will consumes 40 TWh per year ! 4 billion dollars at 100$ per MWh Occupies more then 2 Hoover Dams 24X7

Energy User News (EUN) predictions for 2005:Energy User News (EUN) predictions for 2005:

http://www.energyusernews.com/

5

Power consumption hurts Power becomes a considerable factor in the TCO of a data center

Power delivery

Cooling the system (air-condition)

high operating temperature effect the stability and reliability of the system

Electricity production harms the environment

Disks role in the power equation 27% of the total energy consumed by a data center.

The biggest single load in the system

Storage demands are growing by 60% annually

The use of continuously growing RAID arrays keeps enlarging the disks relative contribution

6

Agenda Motivation



Power Aware Cache


Summary

7

The Mobile WorldStop the disks from spinning during idle periods

Working Sleeping

BUSY

IDLE STANDBYSPINDOWN

SEEK

SPINUP

The spindle motor is the major power consumer. (81.34%)

The motor is used to spin the platters

This power is expended in IDLE periods too

8


Most of the energy is expanded during IDLE states

Even in high loads this ratio stays the same

Spinning the disk during IDLE state is a pure overhead

9


The device should stay asleep for enough time to compensate the shutdown and wakeup overhead

Requests can’t be served during wake-up period (spin-up) Performance penalty – increase average response time

Numerous algorithm exist for mobile computers predicting idle times using recent history

Naïve approach – stop the disk after a constant idle time

Adaptive algorithms

10

Special Needs of the Server World

Servers experience extremely short idle times many transactions at the same time

Continuous request stream rather then intermittent activity

Light loads periods present the same behavior

Previous (recent) history do not provide good predictions of future idle times TPM schemes do not work that well

Performance degradations is usually unacceptable

Workload characteristicWorkload characteristic

Yet, this partition still holds:

11

High end disks characteristicHigh end disks characteristic Spinning the disk up or down takes very long time

The energy penalty of spinning the disk up and down in much more painful

Power consumption is larger Higher rpm

More, Heavier platters


12

High end disks characteristicHigh end disks characteristic

⇒ minimum sleeping time needed is significantly larger Allow enough time to spin the disk down and up

Must compensate much larger overheads

⇒ Much larger impact on performance Large latency of spinning up the disk to serve a new request


Spinning the disk up or down takes very long time

The energy penalty of spinning the disk up and down in much more painful

13

Agenda Motivation



Power Aware Cache


Summary

14

DRPM: Dynamic Rotations per Minutes

“DRPM: Dynamic Speed Control for Power Management in Server Class Disks”Sudhanva Gurumurthi, Anand Sivasubramaniam, Mahmut Kandemir, and Hubertus Franke. In Proceedings of the International Symposium on Computer Architecture (ISCA’03),June 9-11, 2003, San Diego, California, USA IEEE CS Press, pp. 169-179.http://www.cse.psu.edu/~anand/csl/papers/isca03.pdf

“Reducing Disk Power Consumption in Servers with DRPM”Sudhanva Gurumurthi, Anand Sivasubramaniam, Mahmut Kandemir, and Hubertus Franke. IEEE Computer, 36(12):59-66, December 2003.http://www.cse.psu.edu/~gurumurt/papers/ieee_comp03.pdf

15

Modulate the disk speed dynamically SW can dynamically control the spindle motor. (via a register)

A larger spectrum then the TPM on/off modes

Request can be served at all speeds of the spectrum

DRPM: Main idea

2 2EK

PR

Disk configuration and assumptions:Disk configuration and assumptions:

Rotation speed: 3600-12,000 rpm. 15 levels of rpm

Similar to the DVS equation (not really…) KE = const, = rotation speed, R= motor renitence

Time to change RPM is proportional to the amplitude of the change

16

The large “minimum sleeping time” needed Spinning the disk up or down takes very long time

A huge energy penalty of spinning the disk up and down

The very short idle times in servers workload

DRPM - Advantages over TPM

17

DRPM - Advantages over TPM

The performance degradation due to the long wake up time

Exploits much shorter idle periods No need to fully spin down the disk

saves time

Reduce the energy overhead (is it really? )

18

DRPM - Advantages over TPM Exploits much shorter idle periods

No need to fully spin down the disk

saves time

Reduce the energy overhead (is it really? ) Reduces the wake up latency

Do not start form 0 rpm

Do not necessarily have to get to full rotation speed

The bad predictability of future idle times.

The performance degradation due to the long wake up time. The bad predictability of future idle times

19

DRPM - Advantages over TPM Exploits much shorter idle periods

No need to fully spin down the disk

saves time

Reduce the energy overhead (is it really? ) Reduces the wake up latency

Do not start form 0 rpm

Do not necessarily have to get to full rotation speed.

The bad predictability of future idle times.

The performance degradation due to the long wake up time. Provide the flexibility of dynamically choosing the operating point in the power-performance tradeoff Adjust to current workload rather then guess a prediction (and fail…)

20

Each disk tries independently to lower its rpm up to a given watermark

Disks check periodically if their request queue is empty

If the request queue is empty, the disk reduce its rpm by one level

The disk can’t reduce it’s rpm below the watermark

The array centralized controller tracks the total array performance

The controller optimizes the energy policy to the specific workload.

Each disk specific watermark is set by the centralized controller

Tracks response time for I/O request

Periodically calculate percentage change in response time over past 2 periods

If ΔTperf > Upper-Tolerance-Level all watermarks are set to maximum

If ΔTperf < Lower-Tolerance-Level the watermark is further reduced

If LT <ΔTperf < UT the watermark is kept the same

DRPM Algorithm

21

DRPM: Main ideaΔTdiff > UT: LOW_WM is set to maximum

22

DRPM: Main ideaLT <ΔTdiff < UT: LOW_WM is retained

23

DRPM: Main ideaΔTdiff < LT: LOW_WM is lowered

24

DRPM: ResultsEnergy SavingEnergy Saving

A significant energy reduction (up to 55%)

25

DRPM: ResultsPerformancePerformance

Almost no performance degradation

26

DRPM Technology problems

Providing speed control

Head fly height

Head positioning Servo and Data Channel design Sampling frequency for servo control

Affect of DRPM on reliability and MTTF

Technology isn’t quite there yetTechnology isn’t quite there yet

But we’re getting closer!

Sony’s Multimedia Hard Disk Drive: Support 2 rotational speeds

Pre-configured. (can’t be changed dynamically)

27

DRPM: Summary Dynamically control the rotational speed

Exploit the short idle periods

Allow better optimizations then TPM techniques

Reduce Energy without significant performance degradation

Technology isn’t quite there yet

The heuristic is very naïve! A lot of room for improvement

28

Agenda Motivation



Power Aware Cache


Summary

29

Power Aware Cache “PB-LRU: A Tuning Power Aware Storage Cache Replacement Algorithm for

Conserving Disk Energy”Qingbo Zhu, Asim Shankar and Yuanyuan Zhou In proceeding of the 18th International Conference on Supercomputing (ICS’04), June 26-July 1, 2004, Malo, France..http://www-faculty.cs.uiuc.edu/~yyzhou/paper/ICS04.pdf

“Reducing energy consumption of disk storage using power-aware cache management”Q. Zhu, F. M. David, C. F. Devaraj, Z. Li, Y. Zhou, and P. CaoIn 10th International Symposium on High Performance Computer Architecture, (HPCA-10) February 14-18, 2004 Madrid, Spain..http://carmen.cs.uiuc.edu/paper/HPCA04.pdf

30

Power Aware Cache

File access frequencies are highly skewed Not all files enjoy the same popularity

The workload is not equally distributed among all disks

Fundamental observations:Fundamental observations:

31

Power Aware Cache

Cache policy directly affect the disk array energy consumption Generates the disk access sequences

Can increase disk idle time and provide more opportunities for energy save

Power management schemes change I/O response time I/O Response time of an idle disk is huge (10.9sec to spin-up)

Cache miss penalty is not a constant any more

Awareness to power management police will yield better performance

The Storage cache roleThe Storage cache role

32

Main IdeaMain Idea

Power Aware Cache

The large “minimum sleeping time” needed Spinning the disk up or down takes very long time




33

Main IdeaMain Idea

Power Aware Cache

Selectively keep blocks from “inactive” disks in the cache longer Extend the idle period length of those disks

Allow longer low-power modes

Divide the entire cache into separate partitions – one for each disk Each partition is managed using LRU

Find the partition that will minimize energy consumption Reduce the partition size for active disk

Increase the partition size for inactive disks

34

PB-LRU Algorithm For each disk, maintain the energy consumption that

would have been used under every partition size

Done at run-time

Periodically, find the energy-optimal partitioning A form of the Multiple choice Knapsack Problem (MCKP)

NP-Hard

( )numCacheSize numDisks

35

Energy estimation for different cache sizeEnergy estimation for different cache size

PB-LRU Algorithm

Uses Mattson’s Stack algorithm Relays on the inclusion property of many replacement policies

for the same accesses sequenceThe content of a cache of size k is a subset of the content of a cache of size k+1

Maintain the accesses history in a stack

Accesses to block in depth i will be a miss for all cache size < i

Energy is estimated using: Previous cache miss time

Energy consumption until now

Knowledge of the PDM used by the disks

Determines the current power mode according to the idle period length

36

Energy estimation for different cache sizeEnergy estimation for different cache size

PB-LRU Algorithm

37

Solving the MCKP problemSolving the MCKP problem

PB-LRU Algorithm

The Classical knapsack problem Given a set of items, each with a cost and a value, determine the number of each item to include in a collection so that the total cost is less than some given cost and the total value is as large as possible

Multiple Choice Knapsack Problem One or more group of items

Exactly one item can be picked from each group

Approximated solution using dynamic programming Pseudo-polynomial time

2O numDisks numCacheSize

38

PB-LRU – Simulation Results

Energy ConsumptionEnergy Consumption Normalized to LRU

Infinite cache with oracle DPM serves as the lower bound

Up to 22% energy saving

% e

nerg

y c

onsu

mpti

on

39


Average response timeAverage response time Normalized to LRU

There are much better algorithms then LRU for cache policies

Up to 50% reduction in response time

% r

esp

onse

tim

e

40


Cache partition SizeCache partition Size Increase the cache size of inactive disks,

Decrease the cache size of the active disks

The energy penalty of decreasing inactive disks partition size is very high

41

PB-LRU Soft Spots Performance issues:

The algorithm is oblivious to performance

Performance as a ‘by product’ and not as a design goal

Lack of detailed analysis on the effect of DPM on power oblivious caches

Average I/O access time with and without DPM

Breaks the hierarchies encapsulation

42

Power Aware Cache – Summary

Selectively keep blocks from “inactive” disks in the cache longer Extend the idle period length of those disks

Allow longer low-power modes.

Cache targeted to minimize energy consumption Must meet the performance requirements

Quickly adaptive, elegant, on line algorithm

Very promising approach

43

Agenda Motivation



Power Aware Cache


Summary

44

Popular Data Concentration “Energy conservation techniques for disk array-based servers”

Eduardo Pinheiro and Ricardo BianchiniIn proceeding of the 18th International Conference on Supercomputing (ICS’04), June 26-July 1, 2004, Malo, France.http://www.cs.rutgers.edu/~ricardob/papers/ics04.ps.gz

"Massive Arrays of Idle Disks For Storage Archives" Dennis Colarelli and Dirk GrunwaldIn proceedings of the 15th High Performance Networking and Computing Conference, November 2002, Baltimore, Maryland http://sc-2002.org/paperpdfs/pap.pap312.pdf

45

PDC: Popular Data Concentration


Fundamental observation:Fundamental observation:

Main IdeaMain Idea The large “minimum sleeping time” needed

Spinning the disk up or down takes very long time



46

PDC: Popular Data Concentration

Dynamically migrate popular data to a subset of the disks in the array

Skew the load towards a few of the disks Other disks will have longer idle times

Can be switched to a low power mode for longer periods

Applied periodically to adjust to data popularity changes

Limits the load of the popular disks prevent performance degradation due to local congestion of accesses

MBytes

Second

Main IdeaMain Idea


Fundamental observation:Fundamental observation:

47

PDC: Algorithm details

A sophisticated cache replacement algorithm Exploit both the recency and frequency features of a workload

Multiple LRU queues

Blocks stays in the LRU for a given lifetime Block that wasn’t accessed during its lifetime is demoted to the next queue

After a block in Qi is accessed 2i times it is promoted to Qi+1

Multi Queue (MQ) AlgorithmMulti Queue (MQ) Algorithm

Qm-1

Qm

Q0

48

PDC: Algorithm details Maintain block reference history in MQ rank list

Periodically migrate files to disk based on the MQ rank list Migration period is half an hour

Stride the MQ list from head to tail. (Qm-1 to Q0)

Migrate files to the same disk until reaching the maximum allowed load

Expected file load is estimated as file size

average inter access time

Two methods compared Spun down the disk after a fixed idleness period (TPM)

Using to 2-speed level disks

Disks under a low load switch to lower rpm. (DRPM wannabe)

Power conservation techniquesPower conservation techniques

49

TPM can not be used Achieves energy saving only in very low rates

Cause significant performance degradation

Many requests wait for disks spin-up

2-speed disks arrays performs very well in light loads 30-40% energy saving

2-5% deterioration in response time

PDC: Simulations Conclusions

50

Migration process adds unnecessary blocks transfers Pure energy overhead

Temporally increase the disks load

The algorithm is far from perfect Estimation of the file contribution to the disk load is pretty lame

The fact that TPM is useless is fishy

The unbalanced load put too much stress on a subset of the disks Reliability issues

PDC: Soft Spots

51

Dynamically migrate popular data to a subset of the disks in the array

Skew the load towards a few of the disks More power optimization opportunities to the unemployed disks

Integrates well with DRPM techniques

Very intuitive and hence very promising A lot of work left to be done

PDC: Summary

52

Agenda Motivation



Power Aware Cache


Summary

53

Summary

What have we seenWhat have we seen DRPM

Dynamically chance the rotation speed of the disk

Present more energy saving by better exploitation of the short idle time

Power Aware Cache techniques. (PB-LRU) Extend the idle periods of inactive disks

allow longer power-save modes

Popular Data Concentration (PDC) Skew the load towards a few of the disks

Create inactive disks with long idle periods

54

Summary All Papers pinpoint and tackle the same problems



The ICS’04 papers are based on the same basic solution Exploit the skew in the popularity of files and disks in the array

each paper weave its own path though

Each paper approaches a different level of the system DRPM works on the disk level

PB-LRU study the cache

PDC alter the intermediate level (disk array centralized controller)

Negligible intersection with DRPM

55

Summary

Integration Time (Editor’s pick)Integration Time (Editor’s pick) All disks use DRPM

Storage cache applies the Power Aware Cache algorithm

Popular Data Concentration is used to further skew the load

The few busiest disks should be MEMS storage devices Can confront the very high load

Economical solution as only a few of those are used

56

SummaryA super hot research filedA super hot research filed

Works are still preliminary

Many aspects were never touched RAID were not examine yet

Each niche has its own special needs

Different workloads with different behaviors

Simulation and results are not convincing Lake of simulation tools for huge disks array

One size doesn’t fit all

Thesis quality materialAny questions ?

power and energy conservation techniques for disk array based systems zvika guz...

Documents