javier cuenca, josé gonzález department of ingeniería y tecnología de computadores domingo...

28
Javier Cuenca, José González Department of Ingeniería y Tecnología de Computadores Domingo Giménez Department of Informática y Sistemas University of Murcia SPAIN Towards the Design of an Automatically Tuned Linear Algebra Library

Upload: elizabeth-may

Post on 17-Jan-2016

262 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Javier Cuenca, José González Department of Ingeniería y Tecnología de Computadores Domingo Giménez Department of Informática y Sistemas University of Murcia

Javier Cuenca, José GonzálezDepartment of Ingeniería y Tecnología de Computadores

Domingo Giménez Department of Informática y Sistemas

University of MurciaSPAIN

Towards the Design of an Automatically Tuned Linear Algebra

Library

Page 2: Javier Cuenca, José González Department of Ingeniería y Tecnología de Computadores Domingo Giménez Department of Informática y Sistemas University of Murcia

Linear Algebra: highly optimizable operations, but optimizations are Platform Specific

Traditional method: Hand-Optimization for each platform• Time-consuming• Incompatible with Hardware Evolution• Incompatible with changes in the system (architecture and

basic libraries)• Unsuitable for systems with variable workload• Misuse by non expert users

Current Situation of Linear Algebra Parallel Routines

Page 3: Javier Cuenca, José González Department of Ingeniería y Tecnología de Computadores Domingo Giménez Department of Informática y Sistemas University of Murcia

Some groups and projects:

ATLAS, GrADS, LAWRA, FLAME, I-LIB

But the problem is very complex.  

Solutions to this situation?

Page 4: Javier Cuenca, José González Department of Ingeniería y Tecnología de Computadores Domingo Giménez Department of Informática y Sistemas University of Murcia

Routines Parameterised: System parameters, Algorithmic parameters

System parameters obtained at installation timeAnalytical model of the routine and simple installation routines to

obtain the system parameters

A reduced number of executions at installation time

Algorithmic parameters From the analytical model with the system parameters obtained in the installation process

Our approach

Page 5: Javier Cuenca, José González Department of Ingeniería y Tecnología de Computadores Domingo Giménez Department of Informática y Sistemas University of Murcia

Our approach: the scheme

LAR-IFEXECUT. OF LAR-ERsBL

LIBRARY

INCLUSION PROCESS

LAR-OAPF

OAP SELECTION LAR-SPFINSTALLATION

SYSTEM MANAGER

IMPLEMEN. OF LAR-ERs

LAR-DESIGNER

MODELLING LAR

LAR-MOD

DESIGN

LAR

LAR-ERs

Page 6: Javier Cuenca, José González Department of Ingeniería y Tecnología de Computadores Domingo Giménez Department of Informática y Sistemas University of Murcia

Design: Modelling the LAR LAR-DESIGNER

MODELLING LAR

LAR-MOD

DESIGN

LAR

Page 7: Javier Cuenca, José González Department of Ingeniería y Tecnología de Computadores Domingo Giménez Department of Informática y Sistemas University of Murcia

The behaviour of the algorithm on the platform is defined

Texec = f (SPs, n, APs)

SPs = f(n, APs) System Parameters APs Algorithmic Parameters n Problem Size

LAR-MOD:Analytical Model of LAR

Page 8: Javier Cuenca, José González Department of Ingeniería y Tecnología de Computadores Domingo Giménez Department of Informática y Sistemas University of Murcia

System Parameters (SPs):Hardware Platform Physical Characteristics

Current Conditions

Basic libraries

LARs Performance

LAR-MOD:Analytical Model of LAR

Page 9: Javier Cuenca, José González Department of Ingeniería y Tecnología de Computadores Domingo Giménez Department of Informática y Sistemas University of Murcia

System Parameters (SPs):Hardware Platform Physical Characteristics

Current Conditions

Basic libraries

Two Kinds of SPs:

Communication System Parameters (CSPs)

Arithmetic System Parameters (ASPs)

LARs Performance

LAR-MOD:Analytical Model of LAR

Page 10: Javier Cuenca, José González Department of Ingeniería y Tecnología de Computadores Domingo Giménez Department of Informática y Sistemas University of Murcia

System Parameters (SPs):Hardware Platform Physical Characteristics

Current Conditions

Basic libraries

Two Kinds of SPs:

Communication System Parameters (CSPs):

ts start-up time

tw word-sending time

Arithmetic System Parameters (ASPs)

LARs Performance

LAR-MOD:Analytical Model of LAR

Page 11: Javier Cuenca, José González Department of Ingeniería y Tecnología de Computadores Domingo Giménez Department of Informática y Sistemas University of Murcia

System Parameters (SPs):Hardware Platform Physical Characteristics

Current Conditions

Basic libraries

Two Kinds of SPs:

Communication System Parameters (CSPs)

Arithmetic System Parameters (ASPs):

tc arithmetic cost. Using BLAS: k1 k2 and k3

LARs Performance

LAR-MOD:Analytical Model of LAR

Page 12: Javier Cuenca, José González Department of Ingeniería y Tecnología de Computadores Domingo Giménez Department of Informática y Sistemas University of Murcia

System Parameters (SPs):Hardware Platform Physical Characteristics

Current Conditions

Basic libraries

How to estimate each SP?

1º.- Obtain the kernel of performance cost of LAR

2º.- Make an Estimation Routine from this kernel

LARs Performance

LAR-MOD:Analytical Model of LAR

Page 13: Javier Cuenca, José González Department of Ingeniería y Tecnología de Computadores Domingo Giménez Department of Informática y Sistemas University of Murcia

DesignLAR-DESIGNER

MODELLING LAR

LAR-MOD

DESIGN

LAR

Page 14: Javier Cuenca, José González Department of Ingeniería y Tecnología de Computadores Domingo Giménez Department of Informática y Sistemas University of Murcia

Design: Making the LAR-ERs

IMPLEMEN. OF LAR-ERs

LAR-DESIGNER

MODELLING LAR

LAR-MOD

DESIGN

LAR

LAR-ERs

Page 15: Javier Cuenca, José González Department of Ingeniería y Tecnología de Computadores Domingo Giménez Department of Informática y Sistemas University of Murcia

Arithmetic System Parameters (ASPs):Computation Kernel of the LAR Estimation Routine

Similar storage scheme Similar quantity of data

Communication System Parameters (CSPs):Communication Kernel of the LAR Estimation Routine

Similar kind of communication Similar quantity of data

LAR-ERs: Estimation Routines

Page 16: Javier Cuenca, José González Department of Ingeniería y Tecnología de Computadores Domingo Giménez Department of Informática y Sistemas University of Murcia

IMPLEMEN. OF LAR-ERs

LAR-DESIGNER

MODELLING LAR

LAR-MOD

DESIGN

LAR

LAR-ERs

Design

Page 17: Javier Cuenca, José González Department of Ingeniería y Tecnología de Computadores Domingo Giménez Department of Informática y Sistemas University of Murcia

IMPLEMEN. OF LAR-ERs

LAR-DESIGNER

HAND-MADE

ONLY ONCE

MODELLING LAR

LAR-MOD

DESIGN

LAR

LAR-ERs

Design: Process has finished

Page 18: Javier Cuenca, José González Department of Ingeniería y Tecnología de Computadores Domingo Giménez Department of Informática y Sistemas University of Murcia

Installation: Runing the LAR-ERs

LAR-IFEXECUT. OF LAR-ERsBL

LAR-SPFINSTALLATION

SYSTEM MANAGER

IMPLEMEN. OF LAR-ERs

LAR-DESIGNER

MODELLING LAR

LAR-MOD

DESIGN

LAR

LAR-ERs

Page 19: Javier Cuenca, José González Department of Ingeniería y Tecnología de Computadores Domingo Giménez Department of Informática y Sistemas University of Murcia

Installation: obtaining the OAP

LAR-IFEXECUT. OF LAR-ERsBL

LAR-OAPF

OAP SELECTION LAR-SPFINSTALLATION

SYSTEM MANAGER

IMPLEMEN. OF LAR-ERs

LAR-DESIGNER

MODELLING LAR

LAR-MOD

DESIGN

LAR

LAR-ERs

Page 20: Javier Cuenca, José González Department of Ingeniería y Tecnología de Computadores Domingo Giménez Department of Informática y Sistemas University of Murcia

Algorithmic Parameters (APs)

Known the SPs values,

the Optimum Values for the APs are calculated (OAP):

b block size

p number of processors

r c logical topology

grid configuration (logical 2D mesh)

Installation: obtaining the OAP

Page 21: Javier Cuenca, José González Department of Ingeniería y Tecnología de Computadores Domingo Giménez Department of Informática y Sistemas University of Murcia

Installation

LAR-IFEXECUT. OF LAR-ERsBL

LAR-OAPF

OAP SELECTION LAR-SPFINSTALLATION

SYSTEM MANAGER

IMPLEMEN. OF LAR-ERs

LAR-DESIGNER

MODELLING LAR

LAR-MOD

DESIGN

LAR

LAR-ERs

Page 22: Javier Cuenca, José González Department of Ingeniería y Tecnología de Computadores Domingo Giménez Department of Informática y Sistemas University of Murcia

Installation: putting it all together

LAR-IFEXECUT. OF LAR-ERsBL

LIBRARY

INCLUSION PROCESS

LAR-OAPF

OAP SELECTION LAR-SPFINSTALLATION

SYSTEM MANAGER

IMPLEMEN. OF LAR-ERs

LAR-DESIGNER

MODELLING LAR

LAR-MOD

DESIGN

LAR

LAR-ERs

Page 23: Javier Cuenca, José González Department of Ingeniería y Tecnología de Computadores Domingo Giménez Department of Informática y Sistemas University of Murcia

Installation process finished

LAR-IFEXECUT. OF LAR-ERsBL

LIBRARY

INCLUSION PROCESS

LAR-OAPF

OAP SELECTION LAR-SPFINSTALLATION

SYSTEM MANAGER

IMPLEMEN. OF LAR-ERs

LAR-DESIGNER

MODELLING LAR

LAR-MOD

DESIGN

LAR

LAR-ERs

Page 24: Javier Cuenca, José González Department of Ingeniería y Tecnología de Computadores Domingo Giménez Department of Informática y Sistemas University of Murcia

LAR: Least Squares Toeplitz Routine.

Platform: Network of PCs

LAR: One-sided Block Jacobi Method to solve the Symmetric Eigenvalue Problem.

Platform: SGI Origin 2000

LAR: Gaussian elimination.

Platform: NoW (heterogeneous system)

LAR: block LU factorization.

Platforms: IBM SP2, SGI Origin 2000, NoW

Basic Libraries: reference BLAS, machine BLAS, ATLAS

Experiments

Page 25: Javier Cuenca, José González Department of Ingeniería y Tecnología de Computadores Domingo Giménez Department of Informática y Sistemas University of Murcia

Quotient between the execution time with the parameters provided by the model and the optimum execution time. In the sequential case, and in parallel with

4 and 8 processors.

LU on IBM SP2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

512 1024 1536 2048 2560 3072 3584

SEQ

PAR4

PAR8

Page 26: Javier Cuenca, José González Department of Ingeniería y Tecnología de Computadores Domingo Giménez Department of Informática y Sistemas University of Murcia

Quotient between the execution time with the parameters provided by the model and the optimum execution time. In the sequential case, and in parallel with

4, 8 and 16 processors.

LU on Origin 2000

0

0.2

0.4

0.6

0.8

1

1.2

1.4

512 1024 1536 2048 2560 3072 3584

SEQ

PAR4

PAR8

PAR16

Page 27: Javier Cuenca, José González Department of Ingeniería y Tecnología de Computadores Domingo Giménez Department of Informática y Sistemas University of Murcia

Quotient between the execution time with the parameters provided by the model and the optimum execution time. In the sequential case, and in parallel with 4 processors. Using machine BLAS and ATLAS as basic

libraries.

LU on NoW

0

0,2

0,4

0,6

0,8

1

1,2

512 1024 1536 2048

SEQ BLAS

SEQ ATLAS

PAR4 BLAS

PAR4 ATLAS

Page 28: Javier Cuenca, José González Department of Ingeniería y Tecnología de Computadores Domingo Giménez Department of Informática y Sistemas University of Murcia

We try to develop a methodology valid for a wide range of systems, and to include it in the design of linear algebra libraries:it is necessary to analyse the methodology in more systems and with more routines

The Basic Linear Algebra Library to use can be considered as another parameter

An installation strategy common to a set of routines must be developed

At the moment we are analysing routines individually, but it could be preferable to analyse algorithmic schemes

We are working in the design of a strategy for the parameters election in dynamic systems

Future Works