arash deshmeh, jacob machina, and angela c. sodan · downey model simple (only a and σto be...

22
Arash Deshmeh, Jacob Machina, and Angela C. Sodan University of Windsor, Canada ADEPT Scalability Predictor in Support of Adaptive Resource Allocation IPDPS 2010

Upload: others

Post on 16-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Arash Deshmeh, Jacob Machina, and Angela C. Sodan · Downey Model Simple (only A and σto be learned) Needs few observation points Speedup Curves, A varies 0 50 100 150 200 250 300

Arash Deshmeh, Jacob Machina, and

Angela C. Sodan

University of Windsor, Canada

ADEPT Scalability Predictor

in Support of

Adaptive Resource Allocation

IPDPS 2010

Page 2: Arash Deshmeh, Jacob Machina, and Angela C. Sodan · Downey Model Simple (only A and σto be learned) Needs few observation points Speedup Curves, A varies 0 50 100 150 200 250 300

Outline

� Background: Adaptive Resource Allocation

� Related Work

� Downey Runtime/Speedup Model

� The ADEPT Predictor

� Experimental Results

� Anomaly Detection

� Automated Reliability Judgment

� Summary and Conclusion

Page 3: Arash Deshmeh, Jacob Machina, and Angela C. Sodan · Downey Model Simple (only A and σto be learned) Needs few observation points Speedup Curves, A varies 0 50 100 150 200 250 300

Background: Adaptive Resource Allocation

� Adaptive resource allocation: Up to 70% improvement in avg. response times by

� Reducing fragmentation

� Adapting to current load (low/high)

98% of applications said to be moldable

�Requires knowing jobs’ scalability / efficiency

but not practically available yet

In fact, it is a response-time function in dependence on CPU/core resources (Burton Smith)

Page 4: Arash Deshmeh, Jacob Machina, and Angela C. Sodan · Downey Model Simple (only A and σto be learned) Needs few observation points Speedup Curves, A varies 0 50 100 150 200 250 300

Illustration of Adaptive Resource Allocation

Fragmentation reduction Adaptation to current load

time

Job 2(10 � 8)

Job 1

Nodes/CPUs/cores

Job 2 with

original Size 10� Run at higher efficiency with smaller

sizes if high load� Run at lower efficiency with larger

sizes of low load

size

Ideal

Real

Speedup

minN optN

maxN

Page 5: Arash Deshmeh, Jacob Machina, and Angela C. Sodan · Downey Model Simple (only A and σto be learned) Needs few observation points Speedup Curves, A varies 0 50 100 150 200 250 300

More Background

� Benefits for user:

� Help in choosing job sizes tactically

� Determine maximum meaningful job sizes

(� our data about real applications)

� Relevance for resource allocation in:

� Clusters (MPI jobs)

� SMPs (OpenMP or MPI jobs)

� Virtual-machine resource provisioning

Page 6: Arash Deshmeh, Jacob Machina, and Angela C. Sodan · Downey Model Simple (only A and σto be learned) Needs few observation points Speedup Curves, A varies 0 50 100 150 200 250 300

Related Work

� Most approaches are white-box (detailed model)

� Require tools: code instrumentation, compiler/OS support,

analysis of memory-access behavior, etc.

• Complex and computationally expensive

�Unsuitable for large-scale use in HPC centers

�Valuable for cross-site or new-platform performance

projection

• Black-box approaches (few observ. points, simple model)

�Easy-to-use and cheap

�Suffer from anomalies or non-uniform scalability patterns

Page 7: Arash Deshmeh, Jacob Machina, and Angela C. Sodan · Downey Model Simple (only A and σto be learned) Needs few observation points Speedup Curves, A varies 0 50 100 150 200 250 300

Goals of ADEPT Scalability Predictor

� Goals of ADEPT� Achieve high prediction accuracy

� Provide computationally efficient approach

� Detect and automatically correct individual anomalies

� Detect and model non-uniform patterns (multi-phase)

� Perform reliability judgment with potential advice for outcome improvement

� Apply black-box prediction

� Based on Downey runtime/speedup model

Page 8: Arash Deshmeh, Jacob Machina, and Angela C. Sodan · Downey Model Simple (only A and σto be learned) Needs few observation points Speedup Curves, A varies 0 50 100 150 200 250 300

Downey Model

� Simple (only A and σ to be learned)

� Needs few observation points

Speedup Curves, A varies

0

50

100

150

200

250

300

350

0 100 200 300 400

Speedup Curves, σ varies

0

50

100

150

200

250

300

0 100 200 300 400

Speedup curves for Downey model and a

typical application

0

20

40

60

80

100

120

140

160

0 100 200 300 400

Typical application

Downey model

Flat

Linear

Transitional

Declining

σ + (A+Aσ-σ)/n

σ +1nA(σ+1) / (σ(n+A-

1)+A)

A

1 ≤ n ≤ A+Aσ-σ

A+Aσ-σ ≤ n

High variance

(A-σ/2)/n + σ/2

σ(A-1/2)/n + 1 - σ/2

1

An / (A+(σ/2)(n-1))

An / (σ(A-1/2+n(1-σ/2))

A

1 ≤ n ≤ A

A ≤ n ≤ 2A-1

2A-1 ≤ n

Low variance

T(n)S(n)n rangeMode

Page 9: Arash Deshmeh, Jacob Machina, and Angela C. Sodan · Downey Model Simple (only A and σto be learned) Needs few observation points Speedup Curves, A varies 0 50 100 150 200 250 300

ADEPT Predictor

1. Anomaly detection and scalability-pattern identification

2. Envelope derivation

3. Curve fitting4. Reliability judgment

Core of ADEPT

Page 10: Arash Deshmeh, Jacob Machina, and Angela C. Sodan · Downey Model Simple (only A and σto be learned) Needs few observation points Speedup Curves, A varies 0 50 100 150 200 250 300

Core: Envelope Derivation

� Derives constraints from observations

� Calculates closed-form solutions (within certain

percentage of deviation) from pairs of observations

� Use lowest and highest bounds as overall envelope

Forming the Envelope

0

50

100

150

200

250

300

0 100 200 300 400 N

S

Range Pair 1

Range Pair 2

Range Pair 3

Page 11: Arash Deshmeh, Jacob Machina, and Angela C. Sodan · Downey Model Simple (only A and σto be learned) Needs few observation points Speedup Curves, A varies 0 50 100 150 200 250 300

Core: Curve Fitting

� Prediction per target point, biased to closest observations

� Weighted least-squared relative errors

� Two-step

1. Closest point fixed

2. Extending variation by certain percentage within envelope

� Constraints from envelope and two-step curve fitting make ADEPT both accurate and fast

Speedup Prediction Using 4 Methods

0

50

100

150

200

0 100 200 300 400 500 N

S

Levm ar ADEPT / Exhaus tive / Genetic

Page 12: Arash Deshmeh, Jacob Machina, and Angela C. Sodan · Downey Model Simple (only A and σto be learned) Needs few observation points Speedup Curves, A varies 0 50 100 150 200 250 300

Experimental Set-Up

� Experiments with MPI and OpenMP

� NAS benchmarks BT, CG, FT, LU, SP

� 7 real anonymous applications

(from administrator scalability tests)

� Both interpolation and extrapolation

� 3 to 4 input observation points

� Prediction of T(n) and S(n)

� T(1) not always available

Page 13: Arash Deshmeh, Jacob Machina, and Angela C. Sodan · Downey Model Simple (only A and σto be learned) Needs few observation points Speedup Curves, A varies 0 50 100 150 200 250 300

Experimental Results: Speedup

NAS_FT

0

10

20

30

40

50

60

0 50 100 150

Standard Biased Weighting Predictions

Uniform Weighting Predictions

App_A

0

2

4

6

8

10

12

0 5 10 15 20 25

NAS_OMP_BT

0

1

2

3

4

5

6

0 10 20 30 40

NAS_OMP_CG

0

1

2

3

4

5

6

7

8

0 10 20 30 40

App_E

0

20

40

60

80

100

120

0 50 100 150 200 250 300

App_F

0

10

20

30

40

50

60

70

80

0 50 100 150

� Applied fitting approach better than non-weighted� Both interpolation and extrapolation work well� Most extrapolation still good on twice the number of nodes� Accuracy higher for closer extrapolation

Page 14: Arash Deshmeh, Jacob Machina, and Angela C. Sodan · Downey Model Simple (only A and σto be learned) Needs few observation points Speedup Curves, A varies 0 50 100 150 200 250 300

Experimental Results: RuntimeNAS_BT

1

10

100

1000

0 50 100 150 200 250 300

NAS_CG

1

10

100

1000

0 50 100 150

NAS_FT

1

10

100

1000

0 50 100 150

App_B

1

10

100

1000

10000

0 500 1000 1500 2000

App_D

1

10

100

1000

10000

100000

0 50 100 150 200 250 300

App_E

1

10

100

1000

10000

100000

0 50 100 150 200 250 300

� Both interpolation and extrapolation work well� Whether T(1) available or not did not make any difference� Some predictions perfect match (App_A, App_C, App_G)� Accuracy higher for closer extrapolation

Page 15: Arash Deshmeh, Jacob Machina, and Angela C. Sodan · Downey Model Simple (only A and σto be learned) Needs few observation points Speedup Curves, A varies 0 50 100 150 200 250 300

ADEPT Predictor

1. Anomaly detection and scalability-pattern

identification2. Envelope derivation

3. Curve fitting

4. Reliability judgment

Core of ADEPT

Page 16: Arash Deshmeh, Jacob Machina, and Angela C. Sodan · Downey Model Simple (only A and σto be learned) Needs few observation points Speedup Curves, A varies 0 50 100 150 200 250 300

Anomaly Detection

� Serious deviations from model can be detected

(Application never fully conforms to model)

� Approach: fluctuation metric R

Ri = ((ti * ni/ni+1)/ti+1)*(1+(ni+1-ni)/ni+1)

(idea is relative speedup, normalized to distance)

Check whether Ri+1 > (1+ε)Ri

with ε being sensitivity factor

both Ri+1 and Ri are anomaly candidates

Page 17: Arash Deshmeh, Jacob Machina, and Angela C. Sodan · Downey Model Simple (only A and σto be learned) Needs few observation points Speedup Curves, A varies 0 50 100 150 200 250 300

Individual Anomalous Points

Speedup curve, with anomalous point

0

20

40

60

80

100

120

0 50 100 150 200 250

R Metric Curve

0.80

1.00

1.20

1.40

1.60

1.80

2.00

2.20

0 50 100 150 200

R Metric Curves

0.80

1.00

1.20

1.40

1.60

1.80

2.00

2.20

0 50 100 150 200

Anomaly, NAS_SP

0

20

40

60

80

100

120

140

0 50 100 150 200 250 300

Anomaly, NAS_OMP_SP

0

1

2

3

4

5

6

7

8

9

0 10 20 30 40

Anomaly, Synthetic

0

20

40

60

80

100

120

140

160

0 50 100 150 200 250

• Minimum of 4 input points required• Check R curve after removal of anomaly candidate• If improvement, classify as anomaly point and reduce

its weight for curve fitting

Page 18: Arash Deshmeh, Jacob Machina, and Angela C. Sodan · Downey Model Simple (only A and σto be learned) Needs few observation points Speedup Curves, A varies 0 50 100 150 200 250 300

Anomaly Patterns Stepwise NAS_OMP_FT

0

1

2

3

4

5

6

7

0 10 20 30 40

Stepwise NAS_OMP_FT, Fitted

0

1

2

3

4

5

6

7

8

9

0 10 20 30 40

Stepwise Synthetic, Fitted

0

50

100

150

200

250

300

0 50 100 150 200 250 300 350

Specially Optimized for 2^n Nodes, Fitted

0

10

20

30

40

50

60

0 50 100 150 200

Currently considered:• Stepwise scalability (minimum of 5 points required) � Model instance per phase

• Specially optimized for certain numbers of nodes, e.g. powers of two(minimum of 9 points required), regular anomalous points� Omit other points from curve fitting� Report suitable allocations

Page 19: Arash Deshmeh, Jacob Machina, and Angela C. Sodan · Downey Model Simple (only A and σto be learned) Needs few observation points Speedup Curves, A varies 0 50 100 150 200 250 300

ADEPT Predictor

1. Anomaly detection and scalability-pattern identification

2. Envelope derivation

3. Curve fitting

4. Reliability judgment

Core of ADEPT

Page 20: Arash Deshmeh, Jacob Machina, and Angela C. Sodan · Downey Model Simple (only A and σto be learned) Needs few observation points Speedup Curves, A varies 0 50 100 150 200 250 300

Automated Reliability Judgment

� All input points in linear section

� More input points needed (n ≥ A)

� High fitting error, not explainable as anomaly

� Report problem

� Runner-up problem (two or more model instances with greatly different A match)

� More input points needed (beyond current range)

Page 21: Arash Deshmeh, Jacob Machina, and Angela C. Sodan · Downey Model Simple (only A and σto be learned) Needs few observation points Speedup Curves, A varies 0 50 100 150 200 250 300

Automatic Reliability Judgment (2)

High Fitting Error, NAS_LU

0

50

100

150

200

250

0 50 100 150 200 250 300

All Linear Speedup, App_C

0

5

10

15

20

25

30

35

0 10 20 30 40

� All 3 cases (linear, high-fitting error, runner-up) successfully detected

Runner-Up Model Instance, NAS_SP

1

10

100

1000

0 50 100 150 200 250

Page 22: Arash Deshmeh, Jacob Machina, and Angela C. Sodan · Downey Model Simple (only A and σto be learned) Needs few observation points Speedup Curves, A varies 0 50 100 150 200 250 300

Summary and Conclusion

� ADEPT is accurate and efficient

� For both interpolation and extrapolation (if not too far away)

� Works well without serial time T(1)

� Performance similar to that reported in literature for white-box approaches

� Employs envelope derivation technique to constrain search during model fitting

� Biased model fitting with efficient two-level approach

� Anomaly detection based on fluctuation metric and automatic correction

� Warnings by reliability judgment if prediction uncertain

� Suitable for production environments

� Extrapolative scalability prediction as feedback to users

� Adaptive resource allocation