10-04-23
Challenge the future
DelftUniversity ofTechnology
Overprovisioningfor Performance Consistency
in Grids
Nezih Yigitbasi and Dick Epema
Parallel and Distributed Systems GroupDelft University of Technology
http://guardg.st.ewi.tudelft.nl/
2
The Problem: Performance inconsistency in grids
~70X
• Inconsistent performance common in grids• bursty workloads
• variable background loads
• high rate of failures
• highly dynamic & heterogeneous environment
Bag-of-Tasks with 128 tasks
submitted every 15 minutes
How can we provide consistent performance in grids?How can we provide consistent performance in grids?
3
GOAL-1Realistic performance evaluation of static and dynamic overprovisioning strategies (system’s perspective)
GOAL-2Dynamically determine the overprovisioning factor (Κ) for user specified performance requirements (user’s perspective)
Our goals
4
Outline
Overprovisioning Strategies
Experimental Setup
Results
Dynamically Determining Κ
Conclusions
5
Overprovisioning (I)• Increasing the system capacity to provide better, and in
particular, consistent performance even under variable workloads and unexpected demands
Pros• simple • obviates the need for complex algorithms• easy to deploy & maintain
Cons• cost-ineffective• workloads may evolve (e.g., increasing user base)• lowly-utilized systems
6
Overprovisioning (II)• Preferred way of providing performance guarantees• typical data center utilization is no more than 15-50%• telecommunication systems have ~30% on average
L. A. Barroso and U. Hölzle, The Case for Energy-Proportional Computing,
IEEE Computer, December 2007.
L. A. Barroso and U. Hölzle, The Case for Energy-Proportional Computing,
IEEE Computer, December 2007.
• High overprovisioning factors (Κ) are common in modern systems
• Google: 450,000 (2005)• Microsoft: 218,000 (mid-
2008)• Facebook: 10,000+ (2009)
7
1. Statici. Largestii. Alliii. Number
• Where should we deploy the resources?• Does it make any difference?
2. Dynamic• Dynamic overprovisioning
• a.k.a. auto-scaling• low/high thresholds for acquiring/releasing resources
• Given Κ, it is straightforward to determine the number of processors for a strategy
Overprovisioning strategies
Time
Static Dynamic
Waste
Demand
8
Outline
Overprovisioning Strategies
Experimental Setup
Results
Dynamically Determining Κ
Conclusions
9
System model
• DAS-3 multi-cluster grid• Global Resource Managers (GRM)
interacting with Local Resource Managers (LRM)GRM
globalqueue
LRM
local queues
local jobsglobal job
LRM
LRM
10
Workload
• Realistic workloads consisting of Bag-of-Tasks (BoT)
• Simulations using 10 workloads with 80% load• each workload has ~1650 BoTs and ~10K tasks• duration of each workload is [1 day-1week]
• Real background load trace • DAS-3 trace of June’08 (http://gwa.ewi.tudelft.nl/)
(Distribution parameters are determined after base-two log transformation)
11
Scheduling model
12
Methodology• Compare the overprovisioned system with the initial system (NO)
• For Dynamic
• 69/129 s and 18/23 s for min/max acquisition/release
• 60%/70% for low/high thresholds
• Κ varies over time so for a fair comparison keep it in ± 10% range
13
Traditional performance metrics
First task submitted Last task done
Makespan
14
Consistency metrics
• We define two metrics to capture the notion of consistency across two dimensions
• System gets more consistent as Cd gets closer to 1, Cs gets closer to 0
• A tighter range of the NSL is a sign of better consistency
15
Outline
Overprovisioning Strategies
Experimental Setup
Results
Dynamically Determining Κ
Conclusions
16
Performance of scheduling policies
ECT is the worst
Dynamic Per Task
is the best
17
Performance of different strategies
Different Overprovisioning Factors (Κ)DifferentStrategies
• Consistency obtained with overprovisioning is much better than the initial system (NO)
• Static strategies provide similar performance (only K matters)• All and Largest are viable alternatives to Number as Number increases
the administration, installation, and maintenance costs• Dynamic strategy has better performance compared to static strategies• K = 2.5 is the critical value
18
Cost of different strategies
• Use CPU-Hours• time a processor is used [h]• round up a partial instance-hours to one hour similar to the
Amazon EC2 on-demand instances pricing model
• Significant reduction, as high as ~40%, in cost
19
Outline
Overprovisioning Strategies
Experimental Setup
Results
Dynamically Determining Κ
Conclusions
20
Determining Κ dynamically
• So far system’s perspective, now user’s perspective
• How can we dynamically determine Κ given the user performance requirements?
• We use a simple feedback-control approach to deploy additional resources dynamically to meet user performance requirements
21
Evaluation
• Simulated DAS-3 without background load
• ~1.5 month workload consisting of ~33K BoTs• Empirically show that the controller stabilizes
• Average makespan for the workload in the initial system (without the controller) is ~3120 minutes
• Three scenarios from tight to loose performance requirements• [250m-300m]• [700m-750m]• [1000m-1250m]
22
Results (I)
•Significant improvement, as high as ~65%, when the performance requirements are tight
•~40%-50% improvement for loose performance requirements
23
Results (II)
[250m-300m] [700m-750m]
[1000m-1250m]
24
Conclusions
• Overprovisioning improves performance consistency significantly• Static strategies provide similar performance (only K matters)• Dynamic strategy performs better than the static strategies• Need to determine the critical value to maximize the benefit of overprovisioning
GOAL-2: Dynamically Determining GOAL-2: Dynamically Determining ΚΚ for Given User for Given User Performance RequirementsPerformance Requirements
• Feedback-controlled system tuning K dynamically using historical
performance data and specified performance requirements
• The number of BoTs meeting the performance requirements increases
significantly, as high as 65%, compared to the initial system
GOAL-1: Realistic Performance Evaluation of Different GOAL-1: Realistic Performance Evaluation of Different StrategiesStrategies
25
More Information:
•Guard-g Project: http://guardg.st.ewi.tudelft.nl/
•PDS publication database: http://www.pds.twi.tudelft.nl
Thank you! Questions? Comments?Thank you! Questions? Comments?
“[email protected]”http://www.st.ewi.tudelft.nl/~nezih/
“[email protected]”http://www.st.ewi.tudelft.nl/~nezih/