impact of multi‐core technology on dense linear algebra · hpc&a impact of multi‐core and...
TRANSCRIPT
![Page 1: Impact of Multi‐core Technology on Dense Linear Algebra · HPC&A Impact of Multi‐core and Many‐core Technology on Dense Linear Algebra Enrique S. Quintana-Ortí Berlin, September](https://reader033.vdocuments.mx/reader033/viewer/2022060818/60973835667da92515248711/html5/thumbnails/1.jpg)
HPC&AHPC&A
Impact of Multi core and Many coreImpact of Multi‐core and Many‐core Technology on Dense Linear Algebra
Enrique S. Quintana-Ortíq
Berlin, September 2011
Berlin, September 2011 1
![Page 2: Impact of Multi‐core Technology on Dense Linear Algebra · HPC&A Impact of Multi‐core and Many‐core Technology on Dense Linear Algebra Enrique S. Quintana-Ortí Berlin, September](https://reader033.vdocuments.mx/reader033/viewer/2022060818/60973835667da92515248711/html5/thumbnails/2.jpg)
HPC&AMulti core and Many core HPC&AMulti-core and Many-core
“The free lunch is over” (H. Sutter, 2005)End of the race for GHz
Energy is proportional to f3
Energy becomes heat
N ILPNo more ILPOn average, 1 branch every five instructionsAlso for dense linear algebra?Also for dense linear algebra?
No significant improvement in memory latency1 stall ≈ 240 cycles (2008)1 stall 240 cycles (2008)
Berlin, September 2011 2
![Page 3: Impact of Multi‐core Technology on Dense Linear Algebra · HPC&A Impact of Multi‐core and Many‐core Technology on Dense Linear Algebra Enrique S. Quintana-Ortí Berlin, September](https://reader033.vdocuments.mx/reader033/viewer/2022060818/60973835667da92515248711/html5/thumbnails/3.jpg)
HPC&AMulti core and Many core HPC&AMulti-core and Many-core
…but Moore’s Law still holds Specific design
mance
Heterogeneous multi‐core processors
Perform
Multi‐core processors
Single thread performance
P. Hofstee, IBM Austin
Berlin, September 2011 3
1980 2005 2015? 2025?? 2035???
![Page 4: Impact of Multi‐core Technology on Dense Linear Algebra · HPC&A Impact of Multi‐core and Many‐core Technology on Dense Linear Algebra Enrique S. Quintana-Ortí Berlin, September](https://reader033.vdocuments.mx/reader033/viewer/2022060818/60973835667da92515248711/html5/thumbnails/4.jpg)
HPC&AMulti core and Many core HPC&AMulti-core and Many-core
Intel Xeon Nehalem: 8-coreAMD Opteron Interlagos (bulldozer): 16-core p g ( )chip → Q3 2011
Heterogeneous designs:AMD APUsAMD APUsIntel Sandy BridgeNVIDIA+ARMNVIDIA+ARM
Berlin, September 2011 4
![Page 5: Impact of Multi‐core Technology on Dense Linear Algebra · HPC&A Impact of Multi‐core and Many‐core Technology on Dense Linear Algebra Enrique S. Quintana-Ortí Berlin, September](https://reader033.vdocuments.mx/reader033/viewer/2022060818/60973835667da92515248711/html5/thumbnails/5.jpg)
HPC&ALeveraging TLP HPC&ALeveraging TLP
Libraries (dense linear algebra)PLASMAMAGMAlibflame
Tools (general purpose)(g p p )StarSsStarPUStarPU
Berlin, September 2011 5
![Page 6: Impact of Multi‐core Technology on Dense Linear Algebra · HPC&A Impact of Multi‐core and Many‐core Technology on Dense Linear Algebra Enrique S. Quintana-Ortí Berlin, September](https://reader033.vdocuments.mx/reader033/viewer/2022060818/60973835667da92515248711/html5/thumbnails/6.jpg)
HPC&ALeveraging TLP HPC&ALeveraging TLP
A L L TCholesky: LAPACK+MT-BLAS A11 = L11 L11T
A21 := L21 = A21 L11-T
A22 := A22 – L21 L21T
…
1st iteration 2nd iteration 3rd iteration
Berlin, September 2011 6
![Page 7: Impact of Multi‐core Technology on Dense Linear Algebra · HPC&A Impact of Multi‐core and Many‐core Technology on Dense Linear Algebra Enrique S. Quintana-Ortí Berlin, September](https://reader033.vdocuments.mx/reader033/viewer/2022060818/60973835667da92515248711/html5/thumbnails/7.jpg)
HPC&ALeveraging TLP HPC&ALeveraging TLP
A L L TCholesky: LAPACK+MT-BLAS A11 = L11 L11T
A21 := L21 = A21 L11-T
A22 := A22 – L21 L21T
Some parallelism can be regained with a look‐ahead but:• It complicates programming• It hampers portability
1st iteration
Berlin, September 2011 7
![Page 8: Impact of Multi‐core Technology on Dense Linear Algebra · HPC&A Impact of Multi‐core and Many‐core Technology on Dense Linear Algebra Enrique S. Quintana-Ortí Berlin, September](https://reader033.vdocuments.mx/reader033/viewer/2022060818/60973835667da92515248711/html5/thumbnails/8.jpg)
HPC&ALeveraging TLP HPC&ALeveraging TLP
Cholesky: More concurrency available
…
Different iterations
Same iteration
Different iterations
Berlin, September 2011 8
Same iteration
![Page 9: Impact of Multi‐core Technology on Dense Linear Algebra · HPC&A Impact of Multi‐core and Many‐core Technology on Dense Linear Algebra Enrique S. Quintana-Ortí Berlin, September](https://reader033.vdocuments.mx/reader033/viewer/2022060818/60973835667da92515248711/html5/thumbnails/9.jpg)
HPC&ALeveraging TLP HPC&ALeveraging TLP
Ch l k M il blCholesky: More concurrency available
…
1. Decompose the problem into tasks2. Scheduler (run-time) executes the tasks:
1 Keep track of dependencies1. Keep track of dependencies2. Maximize concurrency (with priorities)3. Balance the load (possibly dynamically)
Berlin, September 2011 9
![Page 10: Impact of Multi‐core Technology on Dense Linear Algebra · HPC&A Impact of Multi‐core and Many‐core Technology on Dense Linear Algebra Enrique S. Quintana-Ortí Berlin, September](https://reader033.vdocuments.mx/reader033/viewer/2022060818/60973835667da92515248711/html5/thumbnails/10.jpg)
HPC&APLASMA: Overview HPC&APLASMA: Overview
U i T U i C lif i @ B k lUniv. Tennessee, Univ. California @ Berkeley, Univ. Colorado @ Denver S t f lti l tfSupport for multi-core platformsSupport for complex/real, single/doubleCovers BLAS-3, linear system solvers, and partially SVD and EVDColumn-major order accepted, internally converted to tile layoutStatic or dynamic scheduler (QUARK)Licensed: free redistribution and reuse
Berlin, September 2011 10
![Page 11: Impact of Multi‐core Technology on Dense Linear Algebra · HPC&A Impact of Multi‐core and Many‐core Technology on Dense Linear Algebra Enrique S. Quintana-Ortí Berlin, September](https://reader033.vdocuments.mx/reader033/viewer/2022060818/60973835667da92515248711/html5/thumbnails/11.jpg)
HPC&APLASMA: Functionality HPC&APLASMA: Functionality
Linear systemsLinear systemsLU w/partial pivoting, CholeskyOverdetermined, underdetermined linear systems via QR and LQExplicit matrix inversionMixed precision iterative refinementMixed precision iterative refinement
Symmetric Eigenvalue Problem (only eigenvalues, no eigenvectors yet)reduction to “block tridiagonal” (band)band reduction (bulge chasing)( g g)QR iteration (LAPACK)
Singular Value Problem (only singular values, no singular vectors yet)reduction to “block bidiagonal” (band)band reduction (bulge chasing)QR algorithm (LAPACK)
Generalized Eigenvalue Problem (s.p.d. matrices only)Ch l k f t i ti f B d li ti t ACholesky factorization of B and application to Areduction to band & band reduction (bulge chasing)QR algorithm (LAPACK)
Berlin, September 2011 11
Thanks to J. Kurzak, ICL ‐ UTK
![Page 12: Impact of Multi‐core Technology on Dense Linear Algebra · HPC&A Impact of Multi‐core and Many‐core Technology on Dense Linear Algebra Enrique S. Quintana-Ortí Berlin, September](https://reader033.vdocuments.mx/reader033/viewer/2022060818/60973835667da92515248711/html5/thumbnails/12.jpg)
HPC&APLASMA: Interface HPC&APLASMA: Interface
PLASMA Init(0);PLASMA_Init(0);
PLASMA_Desc_Create(&descA, ...);PLASMA Desc Create(&descB, ...);PLASMA_Desc_Create(&descB, ...);
PLASMA_dLapack_to_Tile(A, lda, descA);PLASMA dLapack to Tile(B, ldb, descB);_ p _ _ ( , , );
PLASMA_dgesv_Tile(descA, IPIV, descB);
PLASMA_Desc_Destroy(&descA);PLASMA_Desc_Destroy(&descB);
PLASMA_Finalize();
Berlin, September 2011 12
Thanks to J. Kurzak, ICL ‐ UTK
![Page 13: Impact of Multi‐core Technology on Dense Linear Algebra · HPC&A Impact of Multi‐core and Many‐core Technology on Dense Linear Algebra Enrique S. Quintana-Ortí Berlin, September](https://reader033.vdocuments.mx/reader033/viewer/2022060818/60973835667da92515248711/html5/thumbnails/13.jpg)
HPC&APLASMA: Performance HPC&APLASMA: Performance
Berlin, September 2011 13
![Page 14: Impact of Multi‐core Technology on Dense Linear Algebra · HPC&A Impact of Multi‐core and Many‐core Technology on Dense Linear Algebra Enrique S. Quintana-Ortí Berlin, September](https://reader033.vdocuments.mx/reader033/viewer/2022060818/60973835667da92515248711/html5/thumbnails/14.jpg)
HPC&APLASMA: Performance HPC&APLASMA: Performance
Berlin, September 2011 14
![Page 15: Impact of Multi‐core Technology on Dense Linear Algebra · HPC&A Impact of Multi‐core and Many‐core Technology on Dense Linear Algebra Enrique S. Quintana-Ortí Berlin, September](https://reader033.vdocuments.mx/reader033/viewer/2022060818/60973835667da92515248711/html5/thumbnails/15.jpg)
HPC&APLASMA: Performance HPC&APLASMA: Performance
Berlin, September 2011 15
![Page 16: Impact of Multi‐core Technology on Dense Linear Algebra · HPC&A Impact of Multi‐core and Many‐core Technology on Dense Linear Algebra Enrique S. Quintana-Ortí Berlin, September](https://reader033.vdocuments.mx/reader033/viewer/2022060818/60973835667da92515248711/html5/thumbnails/16.jpg)
HPC&APLASMA: Performance HPC&APLASMA: Performance
AMD 48 cores – J. Kurzak ISC’11
Berlin, September 2011 16
![Page 17: Impact of Multi‐core Technology on Dense Linear Algebra · HPC&A Impact of Multi‐core and Many‐core Technology on Dense Linear Algebra Enrique S. Quintana-Ortí Berlin, September](https://reader033.vdocuments.mx/reader033/viewer/2022060818/60973835667da92515248711/html5/thumbnails/17.jpg)
HPC&AMAGMA: Overview HPC&AMAGMA: Overview
Univ. Tennessee, Univ. California @ Berkeley, Univ. Colorado @ Denver Support for hybrid platforms: one or more multi-core processors + 1 GPU (no multi-GPU)p ( )Support for complex/real, single/doubleStatic schedulerStatic schedulerLicensed: free redistribution and reuse
Berlin, September 2011 17
![Page 18: Impact of Multi‐core Technology on Dense Linear Algebra · HPC&A Impact of Multi‐core and Many‐core Technology on Dense Linear Algebra Enrique S. Quintana-Ortí Berlin, September](https://reader033.vdocuments.mx/reader033/viewer/2022060818/60973835667da92515248711/html5/thumbnails/18.jpg)
HPC&AMAGMA: Functionality HPC&AMAGMA: Functionality
Linear systemsLU, QR, and Cholesky factorizationsMixed-precision iterative refinement
Eigenproblems and singular value problemsHessenberg, bidiagonal, and tridiagonal reductionsg, g , g
Generalized Hermitian-definite eigenproblemGeneralized Hermitian definite eigenproblemsolvers
Berlin, September 2011 18
![Page 19: Impact of Multi‐core Technology on Dense Linear Algebra · HPC&A Impact of Multi‐core and Many‐core Technology on Dense Linear Algebra Enrique S. Quintana-Ortí Berlin, September](https://reader033.vdocuments.mx/reader033/viewer/2022060818/60973835667da92515248711/html5/thumbnails/19.jpg)
HPC&AMAGMA: Interface (F90) HPC&AMAGMA: Interface (F90)
d d dmagma_devptr_t :: devptrA, devptrB
!------ Allocate GPU memorystat = cublas_alloc(ldda*n, sizeof_double, devPtrA)
!---- devPtrA = h A!---- devPtrA = h_Acall cublas_set_matrix(n, n, size_of_elt, h_A, lda,
devptrA, ldda)
call magmaf_dgetrf_gpu(n, n, devptrA, ldda, ipiv, info)
Berlin, September 2011 19
![Page 20: Impact of Multi‐core Technology on Dense Linear Algebra · HPC&A Impact of Multi‐core and Many‐core Technology on Dense Linear Algebra Enrique S. Quintana-Ortí Berlin, September](https://reader033.vdocuments.mx/reader033/viewer/2022060818/60973835667da92515248711/html5/thumbnails/20.jpg)
HPC&AMAGMA: Performance HPC&AMAGMA: Performance
Berlin, September 2011 20
![Page 21: Impact of Multi‐core Technology on Dense Linear Algebra · HPC&A Impact of Multi‐core and Many‐core Technology on Dense Linear Algebra Enrique S. Quintana-Ortí Berlin, September](https://reader033.vdocuments.mx/reader033/viewer/2022060818/60973835667da92515248711/html5/thumbnails/21.jpg)
HPC&AMAGMA: Performance HPC&AMAGMA: Performance
Berlin, September 2011 21
![Page 22: Impact of Multi‐core Technology on Dense Linear Algebra · HPC&A Impact of Multi‐core and Many‐core Technology on Dense Linear Algebra Enrique S. Quintana-Ortí Berlin, September](https://reader033.vdocuments.mx/reader033/viewer/2022060818/60973835667da92515248711/html5/thumbnails/22.jpg)
HPC&AMAGMA: Performance HPC&AMAGMA: Performance
Berlin, September 2011 22
![Page 23: Impact of Multi‐core Technology on Dense Linear Algebra · HPC&A Impact of Multi‐core and Many‐core Technology on Dense Linear Algebra Enrique S. Quintana-Ortí Berlin, September](https://reader033.vdocuments.mx/reader033/viewer/2022060818/60973835667da92515248711/html5/thumbnails/23.jpg)
HPC&AMAGMA: Performance HPC&AMAGMA: Performance
Berlin, September 2011 23
![Page 24: Impact of Multi‐core Technology on Dense Linear Algebra · HPC&A Impact of Multi‐core and Many‐core Technology on Dense Linear Algebra Enrique S. Quintana-Ortí Berlin, September](https://reader033.vdocuments.mx/reader033/viewer/2022060818/60973835667da92515248711/html5/thumbnails/24.jpg)
HPC&Alibflame: Overview HPC&Alibflame: Overview
U i T @ A tiUniv. Texas @ AustinObject-oriented APIS f l i d l i GPU l fSupport for multi-core and multi-GPU platformsSupport for complex/real, single/doubleCovers BLAS-1, -2, -3 and linear system solvers. No SVD, EVDl k2fl tibilit llapack2flame compatibility layerColumn-major order or storage-by-blocksMT BLAS d i h d li (S M t i )MT-BLAS or dynamic scheduling (SuperMatrix)Licensed as LGPL
Berlin, September 2011 24
![Page 25: Impact of Multi‐core Technology on Dense Linear Algebra · HPC&A Impact of Multi‐core and Many‐core Technology on Dense Linear Algebra Enrique S. Quintana-Ortí Berlin, September](https://reader033.vdocuments.mx/reader033/viewer/2022060818/60973835667da92515248711/html5/thumbnails/25.jpg)
HPC&Alibflame: Funcionality HPC&Alibflame: Funcionality
operation Classic FLAME FLASH/SM lapack2flameoperation Classic FLAME FLASH/SM lapack2flame
Level‐3 BLAS y y n/a
Cholesky y y y
LU with partial pivoting y y y
LU with incremental pivoting y y *
QR (UT) y y yy y y
LQ (UT) y y y
SPD/HPD inversion y y y
T i l i iTriangular inversion y y y
Triangular Sylvester y y y
Lyapunov y y y
Up‐and‐downdate (UT) y y *
SVD planned
EVD planned
Berlin, September 2011 25
EVD planned
25* Not present in LAPACK
![Page 26: Impact of Multi‐core Technology on Dense Linear Algebra · HPC&A Impact of Multi‐core and Many‐core Technology on Dense Linear Algebra Enrique S. Quintana-Ortí Berlin, September](https://reader033.vdocuments.mx/reader033/viewer/2022060818/60973835667da92515248711/html5/thumbnails/26.jpg)
HPC&Alibflame: Interface HPC&Alibflame: Interface
FLA Obj A p b x;FLA_Obj A, p, b, x;
FLA_Init();
FLA_Obj_create( FLA_DOUBLE, m, m, 0, 0, &A );FLA_Copy_buffer_to_object( FLA_NO_TRANSPOSE, m, m,
buffer, rs, cs, 0, 0, A );, , , , , );...
// Initialize A, b, x.
FLA_LU_piv( A, p );FLA_LU_piv_solve( A, p, b, x );
FLA_Obj_free( &A );...
Berlin, September 2011 2626
![Page 27: Impact of Multi‐core Technology on Dense Linear Algebra · HPC&A Impact of Multi‐core and Many‐core Technology on Dense Linear Algebra Enrique S. Quintana-Ortí Berlin, September](https://reader033.vdocuments.mx/reader033/viewer/2022060818/60973835667da92515248711/html5/thumbnails/27.jpg)
HPC&Alibflame: Performance HPC&Alibflame: Performance
Berlin, September 2011 27
![Page 28: Impact of Multi‐core Technology on Dense Linear Algebra · HPC&A Impact of Multi‐core and Many‐core Technology on Dense Linear Algebra Enrique S. Quintana-Ortí Berlin, September](https://reader033.vdocuments.mx/reader033/viewer/2022060818/60973835667da92515248711/html5/thumbnails/28.jpg)
HPC&Alibflame: Performance HPC&Alibflame: Performance
Berlin, September 2011 28
![Page 29: Impact of Multi‐core Technology on Dense Linear Algebra · HPC&A Impact of Multi‐core and Many‐core Technology on Dense Linear Algebra Enrique S. Quintana-Ortí Berlin, September](https://reader033.vdocuments.mx/reader033/viewer/2022060818/60973835667da92515248711/html5/thumbnails/29.jpg)
HPC&Alibflame: Performance HPC&Alibflame: Performance
Berlin, September 2011 29
![Page 30: Impact of Multi‐core Technology on Dense Linear Algebra · HPC&A Impact of Multi‐core and Many‐core Technology on Dense Linear Algebra Enrique S. Quintana-Ortí Berlin, September](https://reader033.vdocuments.mx/reader033/viewer/2022060818/60973835667da92515248711/html5/thumbnails/30.jpg)
HPC&Alibflame: Performance HPC&Alibflame: Performance
Berlin, September 2011 30
![Page 31: Impact of Multi‐core Technology on Dense Linear Algebra · HPC&A Impact of Multi‐core and Many‐core Technology on Dense Linear Algebra Enrique S. Quintana-Ortí Berlin, September](https://reader033.vdocuments.mx/reader033/viewer/2022060818/60973835667da92515248711/html5/thumbnails/31.jpg)
HPC&AStarSs: Overview HPC&AStarSs: Overview
Barcelona Supercomputing CenterA parallelization framework (tool), not a library:p ( ) y
OpenMP-like pragmasSource-to-source compilerpRun-time library
Support for multi-core multi-accelerator (CellSupport for multi core, multi accelerator (Cell, GPU) etc. in different instances: SMPSs, CellSs, GPUSs etcGPUSs, etc.Licensed as LGPL
Berlin, September 2011 31
![Page 32: Impact of Multi‐core Technology on Dense Linear Algebra · HPC&A Impact of Multi‐core and Many‐core Technology on Dense Linear Algebra Enrique S. Quintana-Ortí Berlin, September](https://reader033.vdocuments.mx/reader033/viewer/2022060818/60973835667da92515248711/html5/thumbnails/32.jpg)
HPC&ASMPSs: Interface HPC&ASMPSs: Interface
int main (int argc, char **argv) { BSint i, j, k;...initialize(A, B, C);
BSNB
NB
for (i=0; i < NB; i++) for (j=0; j < NB; j++) for (k=0; k < NB; k++)mm tile( C[i][j], A[i][k], B[k][j]);
BS
BSmm_tile( C[i][j], A[i][k], B[k][j]);
}
#pragma omp task input([BS][BS]A,[BS][BS]B)\inout([BS][BS]C)static void mm tile ( float C[BS][BS]static void mm_tile ( float C[BS][BS],
float A[BS][BS], float B[BS][BS]) {
int i, j, k;
for (i=0; i< BS; i++) for (j=0; j< BS; j++) for (k=0; k< BS; k++)
Berlin, September 2011 32
C[i][j] += A[i][k] * B[k][j];}
![Page 33: Impact of Multi‐core Technology on Dense Linear Algebra · HPC&A Impact of Multi‐core and Many‐core Technology on Dense Linear Algebra Enrique S. Quintana-Ortí Berlin, September](https://reader033.vdocuments.mx/reader033/viewer/2022060818/60973835667da92515248711/html5/thumbnails/33.jpg)
HPC&AGPUSs: Interface HPC&A
void blocked_cholesky( int NB, float *A ) {
GPUSs: Interface
int i, j, k;for (k=0; k<NB; k++) {spotrf (A[k*NB+k]); for (i=k+1; i<NB; i++)
t (A[k*NB+k] A[k*NB+i])strsm (A[k*NB+k], A[k*NB+i]); for (i=k+1; i<NB; i++) {for (j=k+1; j<i; j++)sgemm( A[k*NB+i], A[k*NB+j], A[j*NB+i]);
ssyrk (A[k*NB+i], A[i*NB+i]);ssyrk (A[k NB+i], A[i NB+i]);}
}#pragma omp task wait
}# t t d i ( d ) d#pragma omp target device (cuda) copy_deps#pragma omp task inout([BS][BS]A)void spotrf (float *A);#pragma omp target device (cuda) copy_deps#pragma omp task input ([BS][BS]A) inout([BS][BS]C)#pragma omp task input ([BS][BS]A) inout([BS][BS]C)void ssyrk (float *A, float *C);#pragma omp target device (cuda) copy_deps#pragma omp task input ([BS][BS]A, [BS][BS]B) inout([BS][BS]C)void sgemm (float *A, float *B, float *C);
Berlin, September 2011 33
#pragma omp target device (cuda) copy_deps#pragma omp task input ([BS][BS]T) inout([BS][BS]B)void strsm (float *T, float *B);
![Page 34: Impact of Multi‐core Technology on Dense Linear Algebra · HPC&A Impact of Multi‐core and Many‐core Technology on Dense Linear Algebra Enrique S. Quintana-Ortí Berlin, September](https://reader033.vdocuments.mx/reader033/viewer/2022060818/60973835667da92515248711/html5/thumbnails/34.jpg)
HPC&AGPUSs: Performance HPC&AGPUSs: Performance Blocked
Matrix‐matrix multiply
Ch l k
Berlin, September 2011 34
Cholesky
![Page 35: Impact of Multi‐core Technology on Dense Linear Algebra · HPC&A Impact of Multi‐core and Many‐core Technology on Dense Linear Algebra Enrique S. Quintana-Ortí Berlin, September](https://reader033.vdocuments.mx/reader033/viewer/2022060818/60973835667da92515248711/html5/thumbnails/35.jpg)
HPC&AControl Problems in GPUs HPC&AControl Problems in GPUs
Matrix inversion via GJE: general and s.p.d.Lyapunov and Riccati sign-function solversy p gMixed precision/iterative refinement for Lyapunov equationsLyapunov equationsDifferential Riccati equationsModel reduction via BT and BST based on thoseModel reduction via BT and BST based on those
Berlin, September 2011 35
![Page 36: Impact of Multi‐core Technology on Dense Linear Algebra · HPC&A Impact of Multi‐core and Many‐core Technology on Dense Linear Algebra Enrique S. Quintana-Ortí Berlin, September](https://reader033.vdocuments.mx/reader033/viewer/2022060818/60973835667da92515248711/html5/thumbnails/36.jpg)
HPC&ARemarks HPC&ARemarks
Impressive potential of new architectures for solving largeImpressive potential of new architectures for solving large-scale problem… but likely of interest only to a few SLICOT usersLib i i t t b d t d ( MAGMA d PLASMALibraries immature to be adopted (e.g., MAGMA and PLASMA may merge)… but they will likely incorporate an LAPACK interfaceN i ifi t h f f ti lit (d ?)No significant change of functionality (decrease?)Change in some of the underlying algorithms:
Incremental QR factorizationLU with incremental pivotingTwo stage reduction to condensed forms
Iterative refinement?GPUs may be integrated with “regular” cores, but it will be at the expense of #cores
Berlin, September 2011 36
![Page 37: Impact of Multi‐core Technology on Dense Linear Algebra · HPC&A Impact of Multi‐core and Many‐core Technology on Dense Linear Algebra Enrique S. Quintana-Ortí Berlin, September](https://reader033.vdocuments.mx/reader033/viewer/2022060818/60973835667da92515248711/html5/thumbnails/37.jpg)
HPC&AAcks HPC&AAcks
Thanks to J. Kurzak and P. Luszczek -The University of Tennessee at Knoxville- for the results with PLASMA and MAGMAThanks to F. Igual -The University of Texas at g yAustin- for the results with libflame
Berlin, September 2011 37