machine learning for program optimizationpingali/cs395t/2012sp/lectures/395...size of parameter...

53
MACHINE LEARNING FOR PROGRAM OPTIMIZATION Amin Shali and Akanksha Jain

Upload: others

Post on 04-Oct-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

MACHINE LEARNING FOR PROGRAM OPTIMIZATION Amin Shali and Akanksha Jain

Page 2: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

Problem Domain

  Multi-variable optimization problems  Thread Count  Blocking parameters  Pre-fetching Distance  Scheduling   Inner Loop Optimizations (Unrolling size)  TLB behavior

Page 3: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

Problem Domain

  Unmanageable search space  Brute-force search

 Expensive and intractable

 Analytical Models  Machine Learning

 Efficient methods to search intelligently  Automatic model construction   Less domain-specific knowledge

Page 4: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

Machine Learning

  Supervised Learning  K- Nearest Neighbors  Artificial Neural Networks  Support Vector Machines  Kernel Machines  Statistical Machine Learning (Bayesian Learning)  Decision Trees

  Unsupervised Learning   Reinforcement Learning

Page 5: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

A Case Study

  Exploiting parallelism  Discovering parallelism  Expressing parallelism  Mapping parallelism

  Fixed heuristics – fail across architectures   Analytical Models – low level hardware details   Online learning models – Expensive learning

Mapping Parallelism to Multi-cores: A Machine Learning Based Approach Zheng Wang and Michael O'Boyle

Page 6: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

A Case Study

•  The number of threads – Artificial Neural Network

•  Selecting four Open MP scheduling policies – SVM –  BLOCK: Iterations are divided into chunks of some fixed size

ceiling). Each thread is assigned a separate chunk. –  CYCLIC: Iterations are divided into chunks of size 1 and each

chunk is assigned to a thread in round-robin fashion. –  DYNAMIC: Iterations are divided into chunks of some fixed

size. Chunks are dynamically assigned to threads on a first-come, first-serve basis as threads become available

–  GUIDED: Chunks are made progressively smaller until the default minimum chunk size (1) is reached.

Page 7: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

Speedup on Xeon Platform for various scheduling policies give the no. of threads

Page 8: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

Speedup on Cell for the cyclic scheduling policy for different number of threads

Page 9: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

A Machine Learning Approach

Page 10: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

Feature Selection

  Features - inputs of a machine learning model   Start with features that might be important   Small feature sets are better

 Learning algorithms run faster  Are less prone to over-fitting the training data  Useless features can confuse learning algorithms

Page 11: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

Feature Selection

Page 12: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

Artificial Neural Networks

  Computational model of biological neurons   Network of simple processing elements

 Complex behavior exhibited by their connections

  Capable of modeling both linear and non-linear functions

  Function approximation   Classification   Robust to noise

Page 13: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

Function Approximation

Page 14: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

Function Approximation

Page 15: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

Over-fitting

Page 16: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

A multi-layered Perceptron

Page 17: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

Learning with an ANN

  Feature Selection  Set of variables representative of the output  Curse of dimensionality

  Supervised learning  Example inputs with correct output   Infer function by minimizing mean-squared error  Back-propagation (gradient descent)

Page 18: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

Performance of the ANN based model

Page 19: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

Support Vector Machines

  Supervised learning for classification   Constructs hyper-planes in a higher dimensional

space to achieve good separation   Maximum-margin hyper-plane   Kernels for non-linear classifiers

  Drawback : Multi-class classification done by reducing the task to several binary problems

Page 20: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics
Page 21: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

Deployment

Page 22: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

Sensitivity to input

  Data insensitive predictor  Profile the program once with each input

  Data Sensitive Predictor  Profile the program once with the smallest input  Program behavior with limited variability across data

sets at a fine-grain parallel leve

Page 23: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

Evaluation

  Stability and Portability  Performance across all programs  Performance on different architectures

  Profiling Overhead  Overhead in extracting features as compared to

analytical models

Methodology – leave-one-out cross validation

Page 24: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

Performance on Xeon

Page 25: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

Profiling Cost

Page 26: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

Archana Ganapathi, Kaushik Datta, Armando Fox and David Patterson

2009

A Case for Machine Learning to Optimize Multicore Performance

Page 27: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

Problem Description

Different scientific parallel algorithm Many multi-core architectures Compilers alone Low performance

Page 28: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

Solution: Auto Tuning

What does it do: Identifies optimization features Search the parameter space for best performing

configuration

Cross platform Easily scalable Minimize programmer tuning time

Page 29: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

Auto Tuning Cons

Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics to prune the space

Only tries to minimize running time

Page 30: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

Statistical Machine Learning (SML)

Draws inferences from automatically constructed models of data

Methodologies include: Bayesian Learning Markov and Graphical Models Regression Kernel methods

Allow to simultaneously tune for multiple metrics

Page 31: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

Proposed Solution

Using SML techniques to explore a larger parameter space faster

Effectively identifying the relationships between optimization parameters and performance metrics

Technique used: Kernel Canonical Correlation Analysis (KCCA)

Page 32: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

Canonical Correlation Analysis (CCA)

Find linear combinations of two sets of variables which have the the maximum correlation with each other

Two sets of variables; X and Y Objective:

Find A and B such that the correlation of AX and BY is maximized.

Useful when the relationship of variables can be captured in linear form

Page 33: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

CCA

X Y

A BU V

Maximize the correlation between U and V

Page 34: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

Kernel Canonical Correlation Analysis (KCCA)

Seeking the same objective as CCA Used to find non-linear relationship between the

two sets of variables Useful when the variables have nonlinear

relations Challenge: defining the kernel functions

Page 35: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

Kernel Trick!

x1

x2

y1

y2

Non-linear relation

Linear relation

KernelTrick

Y1

Y2

Y3

X1

X2

X3

Page 36: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

KCCA

φ(X) φ(Y)

X YA B

U V

Maximize the correlation between U and V

Page 37: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

… KCCA

X Y

Projected spaceAX BYMaximally correlated

Applying kernel functions to X and Y

Page 38: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

Overview of the Problem

ConfigurationParameters

PerformanceParameters

Platform

Scientific Code

Page 39: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

Configuration Parameters

> 2^25

Page 40: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

Performance Metrics

Page 41: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

Case Study: Stencil Code

Sample 1500 datapoints from configuration space

Run stencil with chosen configurations Identify relationship between optimization

configurations and performance Manipulate relationship to find best performing

configuration

Page 42: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

Projecting Features

Page 43: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

Kernel Functions

Euclidean distance is not appropriate Custom measures

Numeric columns: Gaussian kernels

is calculated based on variance of the norms of data points

Non-numeric columns:

Similarity of X1 and X2 =

Page 44: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

Projections

Page 45: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

… Projections

Page 46: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

Finding Optimal Configuration

Page 47: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

… Finding Optimal Configuration

Page 48: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

Genetic Algorithm

Reproduction

Fitness Evaluation Selection

Modification

Initial Population

Page 49: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

AMD Barcelona

Page 50: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

Intel Clovertown

Page 51: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

Intel Clovertown

Page 52: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

Performance vs Energy Efficiency

Page 53: MACHINE LEARNING FOR PROGRAM OPTIMIZATIONpingali/CS395T/2012sp/lectures/395...Size of parameter space to explore is very big even in constrained circumstances Solution: heuristics

Conclusion

Feature Extraction Generating training data is a time-consuming

task and defines the prediction quality Tweaking of learning parameters Adaptive execution of programs for

performance and energy Self-optimization Methods for a broader range of parallel

applications