polly polyhedral optimizations for llvmpouchet/doc/impact-slides.11.pdfi actionscript (adobe) i...

27
Polly Polyhedral Optimizations for LLVM Tobias Grosser - Hongbin Zheng - Raghesh Aloor Andreas Simb¨ urger - Armin Gr¨ osslinger - Louis-No¨ el Pouchet April 03, 2011 Polly - Polyhedral Optimizations for LLVM April 03, 2011 1 / 27

Upload: others

Post on 17-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Polly Polyhedral Optimizations for LLVMpouchet/doc/impact-slides.11.pdfI ActionScript (Adobe) I Google Native Client HLS (C-To-Verilog, LegUp,UCLA - autoESL) ... Common restrictions

PollyPolyhedral Optimizations for LLVM

Tobias Grosser - Hongbin Zheng - Raghesh AloorAndreas Simburger - Armin Grosslinger - Louis-Noel Pouchet

April 03, 2011

Polly - Polyhedral Optimizations for LLVM April 03, 2011 1 / 27

Page 2: Polly Polyhedral Optimizations for LLVMpouchet/doc/impact-slides.11.pdfI ActionScript (Adobe) I Google Native Client HLS (C-To-Verilog, LegUp,UCLA - autoESL) ... Common restrictions

Polyhedral today

Good polyhedral libraries

Good solutions to some problems (Parallelisation, Tiling, GPGPU)

Several successfull research projects

First compiler integrations

but still limited IMPACT.

Can Polly help to change this?

Polly - Polyhedral Optimizations for LLVM April 03, 2011 2 / 27

Page 3: Polly Polyhedral Optimizations for LLVMpouchet/doc/impact-slides.11.pdfI ActionScript (Adobe) I Google Native Client HLS (C-To-Verilog, LegUp,UCLA - autoESL) ... Common restrictions

Outline

1 LLVM

2 Polly - Concepts & Implementation

3 Experiments

3 Future Work + Conclusion

Polly - Polyhedral Optimizations for LLVM April 03, 2011 3 / 27

Page 4: Polly Polyhedral Optimizations for LLVMpouchet/doc/impact-slides.11.pdfI ActionScript (Adobe) I Google Native Client HLS (C-To-Verilog, LegUp,UCLA - autoESL) ... Common restrictions

LLVM

Compiler Infrastructure

Low Level Intermediate LanguageI SSA, Register MachineI Language and Target IndependentI Integrated SIMD Support

Large Set of Analysis and Optimization

Optimizations Compile, Link, and Run Time

JIT Infrastructure

Very convenient to work with

Polly - Polyhedral Optimizations for LLVM April 03, 2011 4 / 27

Page 5: Polly Polyhedral Optimizations for LLVMpouchet/doc/impact-slides.11.pdfI ActionScript (Adobe) I Google Native Client HLS (C-To-Verilog, LegUp,UCLA - autoESL) ... Common restrictions

Classical Compilers:

I clang → C/C++/Objective-CI Mono → .NetI OpenJDK → JavaI dragonegg → C/C++/Fortran/ADA/GoI Others → Ruby/Python/Lua

GPGPU: PTX backendOpenCL (NVIDIA, AMD, INTEL, Apple, Qualcomm, ...)

Graphics Rendering(VMWare Gallium3D/LLVMPipe/LunarGlass/Adobe Hydra)

WebI ActionScript (Adobe)I Google Native Client

HLS (C-To-Verilog, LegUp, UCLA - autoESL)

Source to Source: LLVM C-Backend

Polly - Polyhedral Optimizations for LLVM April 03, 2011 5 / 27

Page 6: Polly Polyhedral Optimizations for LLVMpouchet/doc/impact-slides.11.pdfI ActionScript (Adobe) I Google Native Client HLS (C-To-Verilog, LegUp,UCLA - autoESL) ... Common restrictions

The Architecture

LLVM IR LLVM IRPSCoP

SCoP Detection& LLVM to Poly Code Generation

SIMD Backend

OpenMP Backend

JSCoP Import/Export

* Classical loop transformations (Blocking, Interchange, Fusion, ...)* Expose parallelism* Dead instruction elimination / Constant propagation

Transformations

Manual Optimization / LooPo / Pluto / PoCC+Pluto / ...

DependencyAnalysis

PTX Backend

Polly - Polyhedral Optimizations for LLVM April 03, 2011 6 / 27

Page 7: Polly Polyhedral Optimizations for LLVMpouchet/doc/impact-slides.11.pdfI ActionScript (Adobe) I Google Native Client HLS (C-To-Verilog, LegUp,UCLA - autoESL) ... Common restrictions

The SCoP - Classical Definition

for i = 1 to (5n + 3)

for j = n to (4i + 3n + 4)

A[i-j] = A[i]

if i < (n - 20)

A[i+20] = j

Structured control flowI Regular for loopsI Conditions

Affine expressions in:I Loop bounds, conditions, access functions

Side effect free

Polly - Polyhedral Optimizations for LLVM April 03, 2011 7 / 27

Page 8: Polly Polyhedral Optimizations for LLVMpouchet/doc/impact-slides.11.pdfI ActionScript (Adobe) I Google Native Client HLS (C-To-Verilog, LegUp,UCLA - autoESL) ... Common restrictions

AST based frameworks

What about:

Goto-based loops

C++ iterators

C++0x foreach loop

Common restrictions

Limited to subset of C/C++

Require explicit annotations

Only canonical code

Correct? (Integer overflow, Operator overloading, ...)

Polly - Polyhedral Optimizations for LLVM April 03, 2011 8 / 27

Page 9: Polly Polyhedral Optimizations for LLVMpouchet/doc/impact-slides.11.pdfI ActionScript (Adobe) I Google Native Client HLS (C-To-Verilog, LegUp,UCLA - autoESL) ... Common restrictions

Semantic SCoP

Thanks to LLVM Analysis and Optimization Passes:

SCoP - The Polly way

Structured control flowI Regular for loops → Anything that acts like a regular for loopI Conditions

Affine expressions → Expressions that calculate an affine result

Side effect free known

Memory accesses through arrays → Arrays + Pointers

Polly - Polyhedral Optimizations for LLVM April 03, 2011 9 / 27

Page 10: Polly Polyhedral Optimizations for LLVMpouchet/doc/impact-slides.11.pdfI ActionScript (Adobe) I Google Native Client HLS (C-To-Verilog, LegUp,UCLA - autoESL) ... Common restrictions

Valid SCoPs

do..while loop

i = 0;

do {

int b = 2 * i;

int c = b * 3 + 5 * i;

A[c] = i;

i += 2;

} while (i < N);

pointer loop

int A[1024];

void pointer_loop () {

int *B = A;

while (B < &A[1024]) {

*B = 1;

++B;

}

}

Polly - Polyhedral Optimizations for LLVM April 03, 2011 10 / 27

Page 11: Polly Polyhedral Optimizations for LLVMpouchet/doc/impact-slides.11.pdfI ActionScript (Adobe) I Google Native Client HLS (C-To-Verilog, LegUp,UCLA - autoESL) ... Common restrictions

Polyhedral Representation - SCoP

SCoP = (Context, [Statement])

Statement = (Domain, Schedule, [Access])

Access = (“read”|“write”|“may write”,Relation)

Interesting:

Data structures are integer sets/maps

Domain is read-only

Schedule can be partially affine

Access is a relation

Access can be may write

Polly - Polyhedral Optimizations for LLVM April 03, 2011 11 / 27

Page 12: Polly Polyhedral Optimizations for LLVMpouchet/doc/impact-slides.11.pdfI ActionScript (Adobe) I Google Native Client HLS (C-To-Verilog, LegUp,UCLA - autoESL) ... Common restrictions

Applying transformations

D = {Stmt[i , j ] : 0 <= i < 32 ∧ 0 <= j < 1000}S = {Stmt[i , j ]→ [i , j ]}

S ′ = S

for (i = 0; i < 32; i++)

for (j = 0; j < 1000; j++)

A[i][j] += 1;

Polly - Polyhedral Optimizations for LLVM April 03, 2011 12 / 27

Page 13: Polly Polyhedral Optimizations for LLVMpouchet/doc/impact-slides.11.pdfI ActionScript (Adobe) I Google Native Client HLS (C-To-Verilog, LegUp,UCLA - autoESL) ... Common restrictions

Applying transformations

D = {Stmt[i , j ] : 0 <= i < 32 ∧ 0 <= j < 1000}S = {Stmt[i , j ]→ [i , j ]}TInterchange = {[i , j ]→ [j , i ]}

S ′ = S ◦ TInterchange

for (j = 0; j < 1000; j++)

for (i = 0; i < 32; i++)

A[i][j] += 1;

Polly - Polyhedral Optimizations for LLVM April 03, 2011 13 / 27

Page 14: Polly Polyhedral Optimizations for LLVMpouchet/doc/impact-slides.11.pdfI ActionScript (Adobe) I Google Native Client HLS (C-To-Verilog, LegUp,UCLA - autoESL) ... Common restrictions

Applying transformations

D = {Stmt[i , j ] : 0 <= i < 32 ∧ 0 <= j < 1000}S = {Stmt[i , j ]→ [i , j ]}TInterchange = {[i , j ]→ [j , i ]}TStripMine = {[i , j ]→ [i , jj , j ] : jj mod 4 = 0 ∧ jj <= j < jj + 4}S ′ = S ◦ TInterchange ◦ TStripMine

for (j = 0; j < 1000; j++)

for (ii = 0; ii < 32; ii+=4)

for (i = ii; i < ii+4; i++)

A[i][j] += 1;

Polly - Polyhedral Optimizations for LLVM April 03, 2011 14 / 27

Page 15: Polly Polyhedral Optimizations for LLVMpouchet/doc/impact-slides.11.pdfI ActionScript (Adobe) I Google Native Client HLS (C-To-Verilog, LegUp,UCLA - autoESL) ... Common restrictions

JSCoP - Exchange format

Specification:

Representation of a SCoP

Stored as JSON text file

Integer Sets/Maps use ISL Representation

Benefits:

Can express modern polyhedral representation

Can be imported easily (JSON bindings readily available)

Is already valid Python

Polly - Polyhedral Optimizations for LLVM April 03, 2011 15 / 27

Page 16: Polly Polyhedral Optimizations for LLVMpouchet/doc/impact-slides.11.pdfI ActionScript (Adobe) I Google Native Client HLS (C-To-Verilog, LegUp,UCLA - autoESL) ... Common restrictions

JSCoP - Example

{

"name": "body => loop.end",

"context": "[N] -> { []: N >= 0 }",

"statements": [{

"name": "Stmt",

"domain": "[N] -> { Stmt[i0, i1] : 0 <= i0, i1 <= N }",

"schedule": "[N] -> { Stmt[i0, i1] -> scattering[i0, i1] }",

"accesses": [{

"kind": "read",

"relation": "[N] -> { Stmt[i0, i1] -> A[o0] }"

},

{

"kind": "write",

"relation": "[N] -> { Stmt[i0, i1] -> C[i0][i1] }"

}]

}]

}

Polly - Polyhedral Optimizations for LLVM April 03, 2011 16 / 27

Page 17: Polly Polyhedral Optimizations for LLVMpouchet/doc/impact-slides.11.pdfI ActionScript (Adobe) I Google Native Client HLS (C-To-Verilog, LegUp,UCLA - autoESL) ... Common restrictions

Optimized Code Generation

Automatically detect parallelism,

after code generation

Automatically transform it to:

I OpenMP, if loopF is parallelF is not surrounded by any other parallel loop

I Efficient SIMD instructions, if loopF is innermostF is parallelF has constant number of iterations

Polly - Polyhedral Optimizations for LLVM April 03, 2011 17 / 27

Page 18: Polly Polyhedral Optimizations for LLVMpouchet/doc/impact-slides.11.pdfI ActionScript (Adobe) I Google Native Client HLS (C-To-Verilog, LegUp,UCLA - autoESL) ... Common restrictions

Generation of Parallel Code

for (i = 0; i < N; i++) for (j = 0; j < N; j++) for (kk = 0; kk < 1024; kk++) for (k = kk; k < kk+4; k++) A[j][k] += 9;

for (j = 0; j < M; j++) B[i] = B[i] * i;AAA

Polly - Polyhedral Optimizations for LLVM April 03, 2011 18 / 27

Page 19: Polly Polyhedral Optimizations for LLVMpouchet/doc/impact-slides.11.pdfI ActionScript (Adobe) I Google Native Client HLS (C-To-Verilog, LegUp,UCLA - autoESL) ... Common restrictions

Generation of Parallel Code

for (i = 0; i < N; i++)

for (j = 0; j < N; j++)for (kk = 0; kk < 1024; kk++)for (k = kk; k < kk+4; k++)A[j][k] += 9;

for (j = 0; j < M; j++)B[i] = B[i] * i;AAA

S = {[i, 0, j, ...] : 0 <= i, j < N}

S = {[i, 1, j, ...] : 0 <= i, j < N}

Polly - Polyhedral Optimizations for LLVM April 03, 2011 19 / 27

Page 20: Polly Polyhedral Optimizations for LLVMpouchet/doc/impact-slides.11.pdfI ActionScript (Adobe) I Google Native Client HLS (C-To-Verilog, LegUp,UCLA - autoESL) ... Common restrictions

Generation of Parallel Code

for (i = 0; i < N; i++) #pragma omp parallel for (j = 0; j < N; j++) for (kk = 0; kk < 1024; kk++) for (k = kk; k < kk+4; k++) A[j][k] += 9;

for (j = 0; j < M; j++) B[i] = B[i] * i;AAA

Polly - Polyhedral Optimizations for LLVM April 03, 2011 20 / 27

Page 21: Polly Polyhedral Optimizations for LLVMpouchet/doc/impact-slides.11.pdfI ActionScript (Adobe) I Google Native Client HLS (C-To-Verilog, LegUp,UCLA - autoESL) ... Common restrictions

Generation of Parallel Code

for (i = 0; i < N; i++) #pragma omp parallel for (j = 0; j < N; j++) for (kk = 0; kk < 1024; kk++) for (k = kk; k < kk+4; k++) A[j][k] += 9;

for (j = 0; j < M; j++) B[i] = B[i] * i;AAA

Polly - Polyhedral Optimizations for LLVM April 03, 2011 21 / 27

Page 22: Polly Polyhedral Optimizations for LLVMpouchet/doc/impact-slides.11.pdfI ActionScript (Adobe) I Google Native Client HLS (C-To-Verilog, LegUp,UCLA - autoESL) ... Common restrictions

Generation of Parallel Code

for (i = 0; i < N; i++) #pragma omp parallel for (j = 0; j < N; j++) for (kk = 0; kk < 1024; kk++) A[j][kk:kk+3] += [9,9,9,9];

for (j = 0; j < M; j++) B[i] = B[i] * i;AAA

Polly - Polyhedral Optimizations for LLVM April 03, 2011 22 / 27

Page 23: Polly Polyhedral Optimizations for LLVMpouchet/doc/impact-slides.11.pdfI ActionScript (Adobe) I Google Native Client HLS (C-To-Verilog, LegUp,UCLA - autoESL) ... Common restrictions

Optimizing of Matrix Multiply

0

1

2

3

4

5

6

7

8

9

clang -O3

gcc -ffast-math -O3

icc -fastPolly: Only LLVM -O3

Polly: + Strip mining

Polly: += Vectorization

Polly: += Hoisting

Polly: += Unrolling

Spe

edup

32x32 double, Transposed matric Multiply, C[i][j] += A[k][i] * B[j][k];

Intel R© Core R© i5 @ 2.40GH, polly and clang from 23. March 2011

Polly - Polyhedral Optimizations for LLVM April 03, 2011 23 / 27

Page 24: Polly Polyhedral Optimizations for LLVMpouchet/doc/impact-slides.11.pdfI ActionScript (Adobe) I Google Native Client HLS (C-To-Verilog, LegUp,UCLA - autoESL) ... Common restrictions

Pluto Tiling on Polybench

Polybench 2.0 (large data set), Intel R© Xeon R© X5670 @ 2.93GH

polly and clang from 23. March 2011

Polly - Polyhedral Optimizations for LLVM April 03, 2011 24 / 27

Page 25: Polly Polyhedral Optimizations for LLVMpouchet/doc/impact-slides.11.pdfI ActionScript (Adobe) I Google Native Client HLS (C-To-Verilog, LegUp,UCLA - autoESL) ... Common restrictions

Current Status

LLVM IR LLVM IRPSCoP

SCoP Detection& LLVM to Poly Code Generation

SIMD Backend

OpenMP Backend

JSCoP Import/Export

* Classical loop transformations (Blocking, Interchange, Fusion, ...)* Expose parallelism* Dead instruction elimination / Constant propagation

Transformations

Manual Optimization / LooPo / Pluto / PoCC+Pluto / ...

DependencyAnalysis

Usable for experiments

Planned

Under Construction

PTX Backend

Polly - Polyhedral Optimizations for LLVM April 03, 2011 25 / 27

Page 26: Polly Polyhedral Optimizations for LLVMpouchet/doc/impact-slides.11.pdfI ActionScript (Adobe) I Google Native Client HLS (C-To-Verilog, LegUp,UCLA - autoESL) ... Common restrictions

Future Work

Increase general coverage

Expose more SIMDization opportunities

Modifieable Memory Access Functions

GPU code generation

Polly - Polyhedral Optimizations for LLVM April 03, 2011 26 / 27

Page 27: Polly Polyhedral Optimizations for LLVMpouchet/doc/impact-slides.11.pdfI ActionScript (Adobe) I Google Native Client HLS (C-To-Verilog, LegUp,UCLA - autoESL) ... Common restrictions

Polly - Conclusion

Automatic SCoP Extraction

Non canonical SCoPs

Modern Polyhedral Representation

JSCoP - Connect External Optimizers

OpenMP/SIMD/PTX backends

What features do we miss to apply YOUR optimizations?

http://wiki.llvm.org/Polly

Polly - Polyhedral Optimizations for LLVM April 03, 2011 27 / 27