arash deshmeh, jacob machina, and angela c. sodan · downey model simple (only a and σto be...
TRANSCRIPT
Arash Deshmeh, Jacob Machina, and
Angela C. Sodan
University of Windsor, Canada
ADEPT Scalability Predictor
in Support of
Adaptive Resource Allocation
IPDPS 2010
Outline
� Background: Adaptive Resource Allocation
� Related Work
� Downey Runtime/Speedup Model
� The ADEPT Predictor
� Experimental Results
� Anomaly Detection
� Automated Reliability Judgment
� Summary and Conclusion
Background: Adaptive Resource Allocation
� Adaptive resource allocation: Up to 70% improvement in avg. response times by
� Reducing fragmentation
� Adapting to current load (low/high)
98% of applications said to be moldable
�Requires knowing jobs’ scalability / efficiency
but not practically available yet
In fact, it is a response-time function in dependence on CPU/core resources (Burton Smith)
Illustration of Adaptive Resource Allocation
Fragmentation reduction Adaptation to current load
time
Job 2(10 � 8)
Job 1
Nodes/CPUs/cores
Job 2 with
original Size 10� Run at higher efficiency with smaller
sizes if high load� Run at lower efficiency with larger
sizes of low load
size
Ideal
Real
Speedup
minN optN
maxN
More Background
� Benefits for user:
� Help in choosing job sizes tactically
� Determine maximum meaningful job sizes
(� our data about real applications)
� Relevance for resource allocation in:
� Clusters (MPI jobs)
� SMPs (OpenMP or MPI jobs)
� Virtual-machine resource provisioning
Related Work
� Most approaches are white-box (detailed model)
� Require tools: code instrumentation, compiler/OS support,
analysis of memory-access behavior, etc.
• Complex and computationally expensive
�Unsuitable for large-scale use in HPC centers
�Valuable for cross-site or new-platform performance
projection
• Black-box approaches (few observ. points, simple model)
�Easy-to-use and cheap
�Suffer from anomalies or non-uniform scalability patterns
Goals of ADEPT Scalability Predictor
� Goals of ADEPT� Achieve high prediction accuracy
� Provide computationally efficient approach
� Detect and automatically correct individual anomalies
� Detect and model non-uniform patterns (multi-phase)
� Perform reliability judgment with potential advice for outcome improvement
� Apply black-box prediction
� Based on Downey runtime/speedup model
Downey Model
� Simple (only A and σ to be learned)
� Needs few observation points
Speedup Curves, A varies
0
50
100
150
200
250
300
350
0 100 200 300 400
Speedup Curves, σ varies
0
50
100
150
200
250
300
0 100 200 300 400
Speedup curves for Downey model and a
typical application
0
20
40
60
80
100
120
140
160
0 100 200 300 400
Typical application
Downey model
Flat
Linear
Transitional
Declining
σ + (A+Aσ-σ)/n
σ +1nA(σ+1) / (σ(n+A-
1)+A)
A
1 ≤ n ≤ A+Aσ-σ
A+Aσ-σ ≤ n
High variance
(A-σ/2)/n + σ/2
σ(A-1/2)/n + 1 - σ/2
1
An / (A+(σ/2)(n-1))
An / (σ(A-1/2+n(1-σ/2))
A
1 ≤ n ≤ A
A ≤ n ≤ 2A-1
2A-1 ≤ n
Low variance
T(n)S(n)n rangeMode
ADEPT Predictor
1. Anomaly detection and scalability-pattern identification
2. Envelope derivation
3. Curve fitting4. Reliability judgment
Core of ADEPT
Core: Envelope Derivation
� Derives constraints from observations
� Calculates closed-form solutions (within certain
percentage of deviation) from pairs of observations
� Use lowest and highest bounds as overall envelope
Forming the Envelope
0
50
100
150
200
250
300
0 100 200 300 400 N
S
Range Pair 1
Range Pair 2
Range Pair 3
Core: Curve Fitting
� Prediction per target point, biased to closest observations
� Weighted least-squared relative errors
� Two-step
1. Closest point fixed
2. Extending variation by certain percentage within envelope
� Constraints from envelope and two-step curve fitting make ADEPT both accurate and fast
Speedup Prediction Using 4 Methods
0
50
100
150
200
0 100 200 300 400 500 N
S
Levm ar ADEPT / Exhaus tive / Genetic
Experimental Set-Up
� Experiments with MPI and OpenMP
� NAS benchmarks BT, CG, FT, LU, SP
� 7 real anonymous applications
(from administrator scalability tests)
� Both interpolation and extrapolation
� 3 to 4 input observation points
� Prediction of T(n) and S(n)
� T(1) not always available
Experimental Results: Speedup
NAS_FT
0
10
20
30
40
50
60
0 50 100 150
Standard Biased Weighting Predictions
Uniform Weighting Predictions
App_A
0
2
4
6
8
10
12
0 5 10 15 20 25
NAS_OMP_BT
0
1
2
3
4
5
6
0 10 20 30 40
NAS_OMP_CG
0
1
2
3
4
5
6
7
8
0 10 20 30 40
App_E
0
20
40
60
80
100
120
0 50 100 150 200 250 300
App_F
0
10
20
30
40
50
60
70
80
0 50 100 150
� Applied fitting approach better than non-weighted� Both interpolation and extrapolation work well� Most extrapolation still good on twice the number of nodes� Accuracy higher for closer extrapolation
Experimental Results: RuntimeNAS_BT
1
10
100
1000
0 50 100 150 200 250 300
NAS_CG
1
10
100
1000
0 50 100 150
NAS_FT
1
10
100
1000
0 50 100 150
App_B
1
10
100
1000
10000
0 500 1000 1500 2000
App_D
1
10
100
1000
10000
100000
0 50 100 150 200 250 300
App_E
1
10
100
1000
10000
100000
0 50 100 150 200 250 300
� Both interpolation and extrapolation work well� Whether T(1) available or not did not make any difference� Some predictions perfect match (App_A, App_C, App_G)� Accuracy higher for closer extrapolation
ADEPT Predictor
1. Anomaly detection and scalability-pattern
identification2. Envelope derivation
3. Curve fitting
4. Reliability judgment
Core of ADEPT
Anomaly Detection
� Serious deviations from model can be detected
(Application never fully conforms to model)
� Approach: fluctuation metric R
Ri = ((ti * ni/ni+1)/ti+1)*(1+(ni+1-ni)/ni+1)
(idea is relative speedup, normalized to distance)
Check whether Ri+1 > (1+ε)Ri
with ε being sensitivity factor
both Ri+1 and Ri are anomaly candidates
Individual Anomalous Points
Speedup curve, with anomalous point
0
20
40
60
80
100
120
0 50 100 150 200 250
R Metric Curve
0.80
1.00
1.20
1.40
1.60
1.80
2.00
2.20
0 50 100 150 200
R Metric Curves
0.80
1.00
1.20
1.40
1.60
1.80
2.00
2.20
0 50 100 150 200
Anomaly, NAS_SP
0
20
40
60
80
100
120
140
0 50 100 150 200 250 300
Anomaly, NAS_OMP_SP
0
1
2
3
4
5
6
7
8
9
0 10 20 30 40
Anomaly, Synthetic
0
20
40
60
80
100
120
140
160
0 50 100 150 200 250
• Minimum of 4 input points required• Check R curve after removal of anomaly candidate• If improvement, classify as anomaly point and reduce
its weight for curve fitting
Anomaly Patterns Stepwise NAS_OMP_FT
0
1
2
3
4
5
6
7
0 10 20 30 40
Stepwise NAS_OMP_FT, Fitted
0
1
2
3
4
5
6
7
8
9
0 10 20 30 40
Stepwise Synthetic, Fitted
0
50
100
150
200
250
300
0 50 100 150 200 250 300 350
Specially Optimized for 2^n Nodes, Fitted
0
10
20
30
40
50
60
0 50 100 150 200
Currently considered:• Stepwise scalability (minimum of 5 points required) � Model instance per phase
• Specially optimized for certain numbers of nodes, e.g. powers of two(minimum of 9 points required), regular anomalous points� Omit other points from curve fitting� Report suitable allocations
ADEPT Predictor
1. Anomaly detection and scalability-pattern identification
2. Envelope derivation
3. Curve fitting
4. Reliability judgment
Core of ADEPT
Automated Reliability Judgment
� All input points in linear section
� More input points needed (n ≥ A)
� High fitting error, not explainable as anomaly
� Report problem
� Runner-up problem (two or more model instances with greatly different A match)
� More input points needed (beyond current range)
Automatic Reliability Judgment (2)
High Fitting Error, NAS_LU
0
50
100
150
200
250
0 50 100 150 200 250 300
All Linear Speedup, App_C
0
5
10
15
20
25
30
35
0 10 20 30 40
� All 3 cases (linear, high-fitting error, runner-up) successfully detected
Runner-Up Model Instance, NAS_SP
1
10
100
1000
0 50 100 150 200 250
Summary and Conclusion
� ADEPT is accurate and efficient
� For both interpolation and extrapolation (if not too far away)
� Works well without serial time T(1)
� Performance similar to that reported in literature for white-box approaches
� Employs envelope derivation technique to constrain search during model fitting
� Biased model fitting with efficient two-level approach
� Anomaly detection based on fluctuation metric and automatic correction
� Warnings by reliability judgment if prediction uncertain
� Suitable for production environments
� Extrapolative scalability prediction as feedback to users
� Adaptive resource allocation