evaluating performance information for mapping algorithms to advanced architectures nayda g....

53
Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department University of Puerto Rico, Mayaguez Campus Sept 1, 2006

Upload: brett-hunter

Post on 29-Dec-2015

219 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Evaluating Performance Information for Mapping Algorithms to Advanced Architectures

Nayda G. Santiago, PhD, PEElectrical and Computer Engineering Department

University of Puerto Rico, Mayaguez Campus

Sept 1, 2006

Page 2: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Outline

Introduction Problems

Methodology Objectives Previous Work Description of Methodology

Case Study Results

Conclusions Future Work

Page 3: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Introduction Problem solving on HPC facility

Conceptualization Instantiation Mapping Parallel Implementation

Goal Can we obtain metrics to characterize what

is happening in the HPC system? Test a methodology for obtaining information

from HPC system. Compare with current results.

Page 4: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Introduction

Source Code

Compiler LinkerExecutable

File

Mapping Process

Libraries

RunningProgram

Instrumentation

Measurement

Page 5: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Introduction

Application Programmer DecisionsProgramming paradigmLanguageCompilerLibrariesAdvanced architectureProgramming styleAlgorithms

Page 6: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Problems

Different factors affect computer performance of an implementation.

Information of high-level effects is lost in the mapping process Out of order execution Compiler optimizations

Complex interactions of parallel code and systems

Current performance analysis tools not appealing

Page 7: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Current Tuning Methodology

High-levelCode

ComputerSystem

InstrumentationTools

PerformanceData

Analysis and EvaluationToolsProgrammer

Libraries Algorithms

ProgrammingParadigm

ProgrammingStyle

LanguagesSystem

Configuration

Use

Evaluation

Experience KnowledgeOn

Tools

In-depthKnowledge On

Computer System

Understand RelationsBetween Performance

Data and CodeBurden on Programmer

Page 8: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

New Tuning Methodology

High-levelCode

ComputerSystem

InstrumentationTools

PerformanceData

Programmer

StatisticalData Analysis

InformationKnowledge-Based

SystemSuggestions

Problem Solving Environment

ExperimentationAlternatives

Modify

Page 9: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Proposed Tuning Methodology

High-levelCode

ComputerSystem

InstrumentationTools

PerformanceData

Programmer

StatisticalData Analysis

InformationKnowledge-Based

SystemSuggestions

ExperimentationAlternatives

Modify

My work

Page 10: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Integrative Performance Analysis

Abstraction – low-level information is hidden

Problem Translation

System LevelsMetrics

• Machine• OS• Node• Network

Mapping back to user’s view?

Measurement – low level information is collected

User’s View

• Tools• High-level Language• Domain Factors

Page 11: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Objectives

Obtain information on the relation between low-level performance information and factors that affect performance.

Lessen the burden of the programmer to incorporate experience and knowledge into the tuning process.

Identify the most important metrics describing system-software interactions.

Identify how many metrics convey most of the information of the system.

Page 12: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Methodology

Design ofExperiment

DataAnalysis

DataCollection

PreliminaryProblemAnalysis

Page 13: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Methodology

Design ofExperiment

DataAnalysis

DataCollection

PreliminaryProblemAnalysis

High-levelCode

ComputerSystem

InstrumentationTools

PerformanceData

Programmer

StatisticalData Analysis

InformationKnowledge-Based

SystemSuggestions

ExperimentationAlternatives

Modify

Page 14: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Preliminary Problem Analysis

Results Profiling is useful for preliminary analysis

Contribution Screening is required to limit number of factors in experiment

• Feasibility Significance

Due to the large number of factors affecting performance and the long running times, experimentation has not been commonly used for performance data evaluation. Screening will make it feasible.

PreliminaryProblemAnalysis

-Evaluation of alternatives-Screening experiment Factors for experimentation

-Understanding of problem

-Application-Performance goal-Potential factors affecting performance

Page 15: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Design of Experiment (DOE)

Systematic planning of experiments Most information

Minimize effect extraneous factors Causal relations

Correlational relations

Design ofExperiment

-DesignFactors

-Levels of each factor-Response variable-Choice of design-Order of treatments

Page 16: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Design of Experiment

Three basic principles Replication

• Estimate experimental error• Precision

Randomization• Independence between observations• Average out effect extraneous factors

Blocking• Block – Set homogeneous experimental

conditions

Page 17: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Design of Experiment

Results The appropriate randomization scheme, number of

replications, and treatment order for the experimental runs.

Contributions Innovative use of DOE for establishing causal

relations for application tuning The appropriate design of experiment should be

selected according to the performance analysis problem

Significance The use of DOE and ANOVA will determine the cause

of the performance differences in the results

Page 18: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Data Collection

Instrumentation Dependent on the system

Observable computing system Metrics can be observed in the system

DataCollection

-Executable File-System-Instrumentation Tool

-Raw data -Sampling -Profiles

Page 19: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Data Collection

InstrumentationSoftware Hardware

Instrumentation tool setup Experimental runs and data collection

Page 20: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Data Collection

Results Measurements of the metrics observed from

the system Particular to this case study

• Between 36 and 52 metrics

DataCollection

-Tool Configuration-Order of runs-Crontab file -System

Raw data (metrics)

Page 21: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Data Analysis

Statistical Analysis Correlation matrix Multidimensional methods

• Dimensionality estimation• Subset Selection

• Entropy cost function

• ANOVA• Post hoc comparisons

DataAnalysis

Raw Data Information

Page 22: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Data Analysis

ConvertFormat

Normalize

CorrelationMatrix

DimensionSubset

Selection

Anova

RawData

PostHoc

Comparisons

Information

PerformanceData

Matrix

Page 23: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Data Analysis: Data Conversion

Raw data Sampling Profiling

Performance Data Matrix Random process

• Average

Random variable

ConvertFormat

RawData

PerformanceData

Matrix

Page 24: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Data Analysis: Data Conversion

Performance data matrix

ma(k,p), where:a: abs or avgk: experimental runp: metric identification number

Multidimensional

M =

ma[0,0] ma[0,1]

ma[1,0]

ma[K-1,0]

ma[1,1]

ma[0,P-1]

ma[1,P-1]

ma[K-1,P-1]ma[K-1,1]

… … ……

Page 25: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Data Analysis: Data Conversion

Performance data matrix example

M =

ExecTime[0] Pgfaults/s[0]

ExecTime[1]

ExecTime[K-1]

Pgfaults/s[1]

IdleTime[0]

IdleTime[1]

IdleTime[K-1]Pgfaults/s[K-1]

… … ……

Metric 0 Metric 1 Metric P-1

Run 0

Run 1

Run K-1

Page 26: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Data Analysis: Correlation Study

ConvertFormat

Normalize

CorrelationMatrix

DimensionSubset

Selection

Anova

RawData

PostHoc

Comparisons

Information

PerformanceData

Matrix

Page 27: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Data Analysis: Correlation Study

Correlation Measure of linear relation among variables No causal

CorrelationMatrix

PerformanceData

Matrix Correlations

Page 28: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Data Analysis: Correlation Study

5 10 15 20 25 30 35

5

10

15

20

25

30

35

Performance Metric Index

Per

form

ance

Met

ric I

ndex

-0.8

-0.6

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

Example

Page 29: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Data Analysis: Correlation Study Correlation formula

Which metrics were most correlated with execution time

Results of correlation analysis Collinearity

• Software instrumentation

yx

K

iii

SSK

yyxxr

11

Where Sx and Sy arethe sample estimate ofthe standard deviation

Page 30: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Data Analysis: Normalization

ConvertFormat

Normalize

CorrelationMatrix

DimensionSubset

Selection

Anova

RawData

PostHoc

Comparisons

Information

PerformanceData

Matrix

Page 31: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Normalization Log normalization

Min-max normalization

Dimension normalization

Data Analysis: Normalization

Scales of metrics vary widely

Normalize

PerformanceData

Matrix

NormalizedPerformance

DataMatrix

na[k,p]=log(ma[k,p])

na[k,p] =ma[k,p]-min(mp

a[k])

max(mpa[k])-min(mp

a[k])

na[k,p]=ma[k,p]

EuclNorm(mpa[k])

Page 32: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Data Analysis: Normalization

Normalization EvaluationArtificially assign classes to data set

• Long execution time• Short execution time

• Used visual separability criteria

Principal Component Analysis (PCA)• Project data along principal components

• Visualized separation of data

Page 33: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Data Analysis: Normalization

-14 -12 -10 -8 -6 -4 -2 0 2 4

x 104

-4

-3

-2

-1

0

1

2

3

4x 10

4

1st Principal Component

2nd

Prin

cipa

l Com

pone

nt

Not normalized

12

Not Normalized

Page 34: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Data Analysis: Normalization

-14 -12 -10 -8 -6 -4 -2 0 2 4

x 104

-4

-3

-2

-1

0

1

2

3

4x 10

4

1st Principal Component

2nd

Prin

cipa

l Com

pone

nt

Not normalized

12

Not Normalized

Page 35: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Data Analysis: Normalization

-1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3-1.5

-1

-0.5

0

0.5

1

1.5

1st Principal Component

2nd

Prin

cipa

l Com

pone

ntNormalizing to range (0,1)

12

Min-max normalization

Page 36: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Data Analysis: Normalization

-1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3-1.5

-1

-0.5

0

0.5

1

1.5

1st Principal Component

2nd

Prin

cipa

l Com

pone

ntNormalizing to range (0,1)

12

Normalizing to range (0,1)

Page 37: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Data Analysis: Normalization

-0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8-1.5

-1

-0.5

0

0.5

1

1st Principal Component

2nd

Prin

cipa

l Com

pone

nt

Normalizing with Euclidean Norm

12

Normalizing with Euclidean Norm

Page 38: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Data Analysis: Normalization

-0.6 -0.4 -0.2 0 0.2 0.4 0.6 0.8-1.5

-1

-0.5

0

0.5

1

1st Principal Component

2nd

Prin

cipa

l Com

pone

nt

Normalizing with Euclidean Norm

12

Normalizing with Euclidean Norm

Page 39: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Data Analysis: Normalization

Results Appropriate normalization scheme

• Euclidean Normalization Contribution

Usage of normalization schemes for performance data

Significance Due to the effect of differences in scale, some

statistical methods may be biased. By normalizing, results obtained will be due to the true nature of the problem and not caused by scale variations.

Page 40: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Data Analysis: Dimension Estimation

ConvertFormat

Normalize

CorrelationMatrix

DimensionSubset

Selection

Anova

RawData

PostHoc

Comparisons

Information

PerformanceData

Matrix

Page 41: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Data Analysis: Dimension Estimation

Dimensionality estimation How many metrics will explain the

system’s behavior?• Scree test

• Plot of eigenvalues of correlation matrix

• Cumulative Percentage of Total Variation

• Keep components explaining variance of data

• Kaiser-Guttman• Eigenvalues of correlation matrix

greater than one.

DimensionP metrics K metrics

K << P

Page 42: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Data Analysis: Dimension Estimation Example

0 2 4 6 8 10 12 14 16 18 200

2

4

6

8

10

12

14

Val

ueCorrelation Matrix Eigenvalues

Index

Page 43: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Data Analysis: Dimension Estimation Results

Dimension reduction to approximately 18% of the size

All three methods have similar results Contribution

Estimation of performance data sets dimension

Significance Provides the minimum set of metrics that

contain the most amount of information needed to evaluate the system

Page 44: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Data Analysis: Metric Subset Selection

ConvertFormat

Normalize

CorrelationMatrix

DimensionSubset

Selection

Anova

RawData

PostHoc

Comparisons

Information

PerformanceData

Matrix

Page 45: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Data Analysis: Metric Subset Selection

1

1

1

1

1log1logN

i

N

ijijijijij SSSSE

Subset Selection Sequential Forward Search Entropy Cost Function

SubsetSelection

P metrics K metrics

K << P

where ijS is the similarity value of two instances

Page 46: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Data Analysis: Metric Subset Selection

ResultsEstablishment of most important

metricsFor case study

• For experiment 1: Paging Activity• For experiment 2: Memory faults• For experiment 3: Buffer activity• For experiment 4: Mix of metrics

Page 47: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Data Analysis: Metric Subset Selection Contributions

The usage of:• Feature subset selection to identify the most

important metrics• Entropy as a cost function for this purpose

Significance The system is viewed as a source of

information. If we can select metrics based on the amount of information they provide, we can narrow down the search for sources of performance problems.

Page 48: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Data Analysis: ANOVA

ConvertFormat

Normalize

CorrelationMatrix

DimensionSubset

Selection

Anova

RawData

PostHoc

Comparisons

Information

PerformanceData

Matrix

Page 49: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Data Analysis: ANOVA

Analysis of Variance (ANOVA) Cause of

variations Null hypothesis

Post Hoc Comparisons After null

hypothesis is rejected

Anova

PostHoc

Comparisons

Raw Data

Factors

Which level?How?Significant Differences?

If factorsCause Variations

Page 50: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Data Analysis: ANOVA

Results Set of factors affecting metric values and the values

Contribution Use of ANOVA for analysis of performance metrics

Significance ANOVA allows to identify whether the variations of

the measurements are due to the random nature of the data or the factors. Incorrect conclusions may be reached if personal judgment is used.

Page 51: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Publications N. G. Santiago, D. T. Rover, and D. Rodriguez, "A Statistical

Approach for the Analysis of the Relation Between Low-Level Performance Information, the Code, and the Environment", Information: An International Interdisciplinary Journal, Vol. 9, No 3, May 2006, pp 503 - 518.

N. G. Santiago, D. T. Rover, D. Rodriguez, “Subset Selection of Performance Metrics Describing System-Software Interactions”, SC2002, Supercomputing: High Performance Networking and Computing 2002, Baltimore MD, November 16-22, 2002.

Santiago, N.G.; Rover, D.T.; Rodriguez, D., “A statistical approach for the analysis of the relation between low-level performance information, the code, and the environment”, The 4th Workshop on High Performance Scientific and Engineering Computing with Applications, HPSECA-02, Proceedings of the International Conference on Parallel Processing Workshops, August 18-21, 2002, Vancouver, British Columbia, Canada, Page(s): 282 -289.

Page 52: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Future Work

Develop a means of providing feedback to the scientific programmer Design a knowledge-based system, PR system? Assign classes to performance outcomes and use a

classifier Compare different entropy estimators for

performance data evaluation. Evaluate other subset selection schemes Compare software versus hardware metrics Compare different architectures and programming

paradigms

Page 53: Evaluating Performance Information for Mapping Algorithms to Advanced Architectures Nayda G. Santiago, PhD, PE Electrical and Computer Engineering Department

Questions?