power and energy conservation techniques for disk array based systems zvika guz...
TRANSCRIPT
Power and Energy Conservation Techniques for
Disk Array Based Systems
Zvika GuzZvika Guz
[email protected]@tx.technion.ac.il
June, 2004June, 2004
2
Agenda Motivation
The Severs world special characteristics Why the mobile world approach doesn’t work
3 different solutions Dynamically modulate the disk speed
Power Aware Cache
Popular Data concentration
Summary
3
Agenda Motivation
The Severs world special characteristics Why the mobile world approach doesn’t work
3 different solutions Dynamically modulate the disk speed
Power Aware Cache
Popular Data concentration
Summary
4
Power consumption in Servers ? Who cares ?
Energy cost is growing by 25% annually. Power requirement will grow from 150Watts/ft2 to 300Watts/ft2.
Requires huge cooling infrastructure in order to dissipate this hit two times the total power used by the computers is needed for HVAC
Total web sites and servers will consumes 40 TWh per year ! 4 billion dollars at 100$ per MWh Occupies more then 2 Hoover Dams 24X7
Energy User News (EUN) predictions for 2005:Energy User News (EUN) predictions for 2005:
http://www.energyusernews.com/
5
Power consumption hurts Power becomes a considerable factor in the TCO of a data center
Power delivery
Cooling the system (air-condition)
high operating temperature effect the stability and reliability of the system
Electricity production harms the environment
Disks role in the power equation 27% of the total energy consumed by a data center.
The biggest single load in the system
Storage demands are growing by 60% annually
The use of continuously growing RAID arrays keeps enlarging the disks relative contribution
6
Agenda Motivation
The Severs world special characteristics Why the mobile world approach doesn’t work
3 different solutions Dynamically modulate the disk speed
Power Aware Cache
Popular Data concentration
Summary
7
The Mobile WorldStop the disks from spinning during idle periods
Working Sleeping
BUSY
IDLE STANDBYSPINDOWN
SEEK
SPINUP
The spindle motor is the major power consumer. (81.34%)
The motor is used to spin the platters
This power is expended in IDLE periods too
8
The Mobile WorldStop the disks from spinning during idle periods
Most of the energy is expanded during IDLE states
Even in high loads this ratio stays the same
Spinning the disk during IDLE state is a pure overhead
9
The Mobile WorldStop the disks from spinning during idle periods
The device should stay asleep for enough time to compensate the shutdown and wakeup overhead
Requests can’t be served during wake-up period (spin-up) Performance penalty – increase average response time
Numerous algorithm exist for mobile computers predicting idle times using recent history
Naïve approach – stop the disk after a constant idle time
Adaptive algorithms
10
Special Needs of the Server World
Servers experience extremely short idle times many transactions at the same time
Continuous request stream rather then intermittent activity
Light loads periods present the same behavior
Previous (recent) history do not provide good predictions of future idle times TPM schemes do not work that well
Performance degradations is usually unacceptable
Workload characteristicWorkload characteristic
Yet, this partition still holds:
11
High end disks characteristicHigh end disks characteristic Spinning the disk up or down takes very long time
The energy penalty of spinning the disk up and down in much more painful
Power consumption is larger Higher rpm
More, Heavier platters
Special Needs of the Server World
12
High end disks characteristicHigh end disks characteristic
⇒ minimum sleeping time needed is significantly larger Allow enough time to spin the disk down and up
Must compensate much larger overheads
⇒ Much larger impact on performance Large latency of spinning up the disk to serve a new request
Special Needs of the Server World
Spinning the disk up or down takes very long time
The energy penalty of spinning the disk up and down in much more painful
13
Agenda Motivation
The Severs world special characteristics Why the mobile world approach doesn’t work
3 different solutions Dynamically modulate the disk speed
Power Aware Cache
Popular Data concentration
Summary
14
DRPM: Dynamic Rotations per Minutes
“DRPM: Dynamic Speed Control for Power Management in Server Class Disks”Sudhanva Gurumurthi, Anand Sivasubramaniam, Mahmut Kandemir, and Hubertus Franke. In Proceedings of the International Symposium on Computer Architecture (ISCA’03),June 9-11, 2003, San Diego, California, USA IEEE CS Press, pp. 169-179.http://www.cse.psu.edu/~anand/csl/papers/isca03.pdf
“Reducing Disk Power Consumption in Servers with DRPM”Sudhanva Gurumurthi, Anand Sivasubramaniam, Mahmut Kandemir, and Hubertus Franke. IEEE Computer, 36(12):59-66, December 2003.http://www.cse.psu.edu/~gurumurt/papers/ieee_comp03.pdf
15
Modulate the disk speed dynamically SW can dynamically control the spindle motor. (via a register)
A larger spectrum then the TPM on/off modes
Request can be served at all speeds of the spectrum
DRPM: Main idea
2 2EK
PR
Disk configuration and assumptions:Disk configuration and assumptions:
Rotation speed: 3600-12,000 rpm. 15 levels of rpm
Similar to the DVS equation (not really…) KE = const, = rotation speed, R= motor renitence
Time to change RPM is proportional to the amplitude of the change
16
The large “minimum sleeping time” needed Spinning the disk up or down takes very long time
A huge energy penalty of spinning the disk up and down
The very short idle times in servers workload
DRPM - Advantages over TPM
17
DRPM - Advantages over TPM
The performance degradation due to the long wake up time
Exploits much shorter idle periods No need to fully spin down the disk
saves time
Reduce the energy overhead (is it really? )
18
DRPM - Advantages over TPM Exploits much shorter idle periods
No need to fully spin down the disk
saves time
Reduce the energy overhead (is it really? ) Reduces the wake up latency
Do not start form 0 rpm
Do not necessarily have to get to full rotation speed
The bad predictability of future idle times.
The performance degradation due to the long wake up time. The bad predictability of future idle times
19
DRPM - Advantages over TPM Exploits much shorter idle periods
No need to fully spin down the disk
saves time
Reduce the energy overhead (is it really? ) Reduces the wake up latency
Do not start form 0 rpm
Do not necessarily have to get to full rotation speed.
The bad predictability of future idle times.
The performance degradation due to the long wake up time. Provide the flexibility of dynamically choosing the operating point in the power-performance tradeoff Adjust to current workload rather then guess a prediction (and fail…)
20
Each disk tries independently to lower its rpm up to a given watermark
Disks check periodically if their request queue is empty
If the request queue is empty, the disk reduce its rpm by one level
The disk can’t reduce it’s rpm below the watermark
The array centralized controller tracks the total array performance
The controller optimizes the energy policy to the specific workload.
Each disk specific watermark is set by the centralized controller
Tracks response time for I/O request
Periodically calculate percentage change in response time over past 2 periods
If ΔTperf > Upper-Tolerance-Level all watermarks are set to maximum
If ΔTperf < Lower-Tolerance-Level the watermark is further reduced
If LT <ΔTperf < UT the watermark is kept the same
DRPM Algorithm
26
DRPM Technology problems
Providing speed control
Head fly height
Head positioning Servo and Data Channel design Sampling frequency for servo control
Affect of DRPM on reliability and MTTF
Technology isn’t quite there yetTechnology isn’t quite there yet
But we’re getting closer!
Sony’s Multimedia Hard Disk Drive: Support 2 rotational speeds
Pre-configured. (can’t be changed dynamically)
27
DRPM: Summary Dynamically control the rotational speed
Exploit the short idle periods
Allow better optimizations then TPM techniques
Reduce Energy without significant performance degradation
Technology isn’t quite there yet
The heuristic is very naïve! A lot of room for improvement
28
Agenda Motivation
The Severs world special characteristics Why the mobile world approach doesn’t work
3 different solutions Dynamically modulate the disk speed
Power Aware Cache
Popular Data concentration
Summary
29
Power Aware Cache “PB-LRU: A Tuning Power Aware Storage Cache Replacement Algorithm for
Conserving Disk Energy”Qingbo Zhu, Asim Shankar and Yuanyuan Zhou In proceeding of the 18th International Conference on Supercomputing (ICS’04), June 26-July 1, 2004, Malo, France..http://www-faculty.cs.uiuc.edu/~yyzhou/paper/ICS04.pdf
“Reducing energy consumption of disk storage using power-aware cache management”Q. Zhu, F. M. David, C. F. Devaraj, Z. Li, Y. Zhou, and P. CaoIn 10th International Symposium on High Performance Computer Architecture, (HPCA-10) February 14-18, 2004 Madrid, Spain..http://carmen.cs.uiuc.edu/paper/HPCA04.pdf
30
Power Aware Cache
File access frequencies are highly skewed Not all files enjoy the same popularity
The workload is not equally distributed among all disks
Fundamental observations:Fundamental observations:
31
Power Aware Cache
Cache policy directly affect the disk array energy consumption Generates the disk access sequences
Can increase disk idle time and provide more opportunities for energy save
Power management schemes change I/O response time I/O Response time of an idle disk is huge (10.9sec to spin-up)
Cache miss penalty is not a constant any more
Awareness to power management police will yield better performance
The Storage cache roleThe Storage cache role
32
Main IdeaMain Idea
Power Aware Cache
The large “minimum sleeping time” needed Spinning the disk up or down takes very long time
A huge energy penalty of spinning the disk up and down
The very short idle times in servers workload
The performance degradation due to the long wake up time
33
Main IdeaMain Idea
Power Aware Cache
Selectively keep blocks from “inactive” disks in the cache longer Extend the idle period length of those disks
Allow longer low-power modes
Divide the entire cache into separate partitions – one for each disk Each partition is managed using LRU
Find the partition that will minimize energy consumption Reduce the partition size for active disk
Increase the partition size for inactive disks
34
PB-LRU Algorithm For each disk, maintain the energy consumption that
would have been used under every partition size
Done at run-time
Periodically, find the energy-optimal partitioning A form of the Multiple choice Knapsack Problem (MCKP)
NP-Hard
( )numCacheSize numDisks
35
Energy estimation for different cache sizeEnergy estimation for different cache size
PB-LRU Algorithm
Uses Mattson’s Stack algorithm Relays on the inclusion property of many replacement policies
for the same accesses sequenceThe content of a cache of size k is a subset of the content of a cache of size k+1
Maintain the accesses history in a stack
Accesses to block in depth i will be a miss for all cache size < i
Energy is estimated using: Previous cache miss time
Energy consumption until now
Knowledge of the PDM used by the disks
Determines the current power mode according to the idle period length
36
Energy estimation for different cache sizeEnergy estimation for different cache size
PB-LRU Algorithm
37
Solving the MCKP problemSolving the MCKP problem
PB-LRU Algorithm
The Classical knapsack problem Given a set of items, each with a cost and a value, determine the number of each item to include in a collection so that the total cost is less than some given cost and the total value is as large as possible
Multiple Choice Knapsack Problem One or more group of items
Exactly one item can be picked from each group
Approximated solution using dynamic programming Pseudo-polynomial time
2O numDisks numCacheSize
38
PB-LRU – Simulation Results
Energy ConsumptionEnergy Consumption Normalized to LRU
Infinite cache with oracle DPM serves as the lower bound
Up to 22% energy saving
% e
nerg
y c
onsu
mpti
on
39
PB-LRU – Simulation Results
Average response timeAverage response time Normalized to LRU
There are much better algorithms then LRU for cache policies
Up to 50% reduction in response time
% r
esp
onse
tim
e
40
PB-LRU – Simulation Results
Cache partition SizeCache partition Size Increase the cache size of inactive disks,
Decrease the cache size of the active disks
The energy penalty of decreasing inactive disks partition size is very high
41
PB-LRU Soft Spots Performance issues:
The algorithm is oblivious to performance
Performance as a ‘by product’ and not as a design goal
Lack of detailed analysis on the effect of DPM on power oblivious caches
Average I/O access time with and without DPM
Breaks the hierarchies encapsulation
42
Power Aware Cache – Summary
Selectively keep blocks from “inactive” disks in the cache longer Extend the idle period length of those disks
Allow longer low-power modes.
Cache targeted to minimize energy consumption Must meet the performance requirements
Quickly adaptive, elegant, on line algorithm
Very promising approach
43
Agenda Motivation
The Severs world special characteristics Why the mobile world approach doesn’t work
3 different solutions Dynamically modulate the disk speed
Power Aware Cache
Popular Data concentration
Summary
44
Popular Data Concentration “Energy conservation techniques for disk array-based servers”
Eduardo Pinheiro and Ricardo BianchiniIn proceeding of the 18th International Conference on Supercomputing (ICS’04), June 26-July 1, 2004, Malo, France.http://www.cs.rutgers.edu/~ricardob/papers/ics04.ps.gz
"Massive Arrays of Idle Disks For Storage Archives" Dennis Colarelli and Dirk GrunwaldIn proceedings of the 15th High Performance Networking and Computing Conference, November 2002, Baltimore, Maryland http://sc-2002.org/paperpdfs/pap.pap312.pdf
45
PDC: Popular Data Concentration
File access frequencies are highly skewed Not all files enjoy the same popularity
Fundamental observation:Fundamental observation:
Main IdeaMain Idea The large “minimum sleeping time” needed
Spinning the disk up or down takes very long time
A huge energy penalty of spinning the disk up and down
The very short idle times in servers workload
46
PDC: Popular Data Concentration
Dynamically migrate popular data to a subset of the disks in the array
Skew the load towards a few of the disks Other disks will have longer idle times
Can be switched to a low power mode for longer periods
Applied periodically to adjust to data popularity changes
Limits the load of the popular disks prevent performance degradation due to local congestion of accesses
MBytes
Second
Main IdeaMain Idea
File access frequencies are highly skewed Not all files enjoy the same popularity
Fundamental observation:Fundamental observation:
47
PDC: Algorithm details
A sophisticated cache replacement algorithm Exploit both the recency and frequency features of a workload
Multiple LRU queues
Blocks stays in the LRU for a given lifetime Block that wasn’t accessed during its lifetime is demoted to the next queue
After a block in Qi is accessed 2i times it is promoted to Qi+1
Multi Queue (MQ) AlgorithmMulti Queue (MQ) Algorithm
Qm-1
Qm
Q0
48
PDC: Algorithm details Maintain block reference history in MQ rank list
Periodically migrate files to disk based on the MQ rank list Migration period is half an hour
Stride the MQ list from head to tail. (Qm-1 to Q0)
Migrate files to the same disk until reaching the maximum allowed load
Expected file load is estimated as file size
average inter access time
Two methods compared Spun down the disk after a fixed idleness period (TPM)
Using to 2-speed level disks
Disks under a low load switch to lower rpm. (DRPM wannabe)
Power conservation techniquesPower conservation techniques
49
TPM can not be used Achieves energy saving only in very low rates
Cause significant performance degradation
Many requests wait for disks spin-up
2-speed disks arrays performs very well in light loads 30-40% energy saving
2-5% deterioration in response time
PDC: Simulations Conclusions
50
Migration process adds unnecessary blocks transfers Pure energy overhead
Temporally increase the disks load
The algorithm is far from perfect Estimation of the file contribution to the disk load is pretty lame
The fact that TPM is useless is fishy
The unbalanced load put too much stress on a subset of the disks Reliability issues
PDC: Soft Spots
51
Dynamically migrate popular data to a subset of the disks in the array
Skew the load towards a few of the disks More power optimization opportunities to the unemployed disks
Integrates well with DRPM techniques
Very intuitive and hence very promising A lot of work left to be done
PDC: Summary
52
Agenda Motivation
The Severs world special characteristics Why the mobile world approach doesn’t work
3 different solutions Dynamically modulate the disk speed
Power Aware Cache
Popular Data concentration
Summary
53
Summary
What have we seenWhat have we seen DRPM
Dynamically chance the rotation speed of the disk
Present more energy saving by better exploitation of the short idle time
Power Aware Cache techniques. (PB-LRU) Extend the idle periods of inactive disks
allow longer power-save modes
Popular Data Concentration (PDC) Skew the load towards a few of the disks
Create inactive disks with long idle periods
54
Summary All Papers pinpoint and tackle the same problems
The very short idle times in servers workload
The performance degradation due to the long wake up time
The ICS’04 papers are based on the same basic solution Exploit the skew in the popularity of files and disks in the array
each paper weave its own path though
Each paper approaches a different level of the system DRPM works on the disk level
PB-LRU study the cache
PDC alter the intermediate level (disk array centralized controller)
Negligible intersection with DRPM
55
Summary
Integration Time (Editor’s pick)Integration Time (Editor’s pick) All disks use DRPM
Storage cache applies the Power Aware Cache algorithm
Popular Data Concentration is used to further skew the load
The few busiest disks should be MEMS storage devices Can confront the very high load
Economical solution as only a few of those are used
56
SummaryA super hot research filedA super hot research filed
Works are still preliminary
Many aspects were never touched RAID were not examine yet
Each niche has its own special needs
Different workloads with different behaviors
Simulation and results are not convincing Lake of simulation tools for huge disks array
One size doesn’t fit all
Thesis quality materialAny questions ?