cr18: advanced compilers l01 introduction tomofumi yuki
TRANSCRIPT
![Page 1: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bfaf1a28abf838c9d4db/html5/thumbnails/1.jpg)
CR18: Advanced Compilers
L01 Introduction
Tomofumi Yuki
![Page 2: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bfaf1a28abf838c9d4db/html5/thumbnails/2.jpg)
Myself
Tomofumi Yuki researcher at Inria
Ph.D. from Colorado State University in 2012 up to high school in Japan CSU for all of bachelor, masters, phd
Member of Compsys @ LIP compilers/languages automatic parallelization
2
![Page 3: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bfaf1a28abf838c9d4db/html5/thumbnails/3.jpg)
This Course
Part I: High-level (loop-level) transformations parallelism data locality
Part II: High-Level Synthesis C to hardware
3
![Page 4: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bfaf1a28abf838c9d4db/html5/thumbnails/4.jpg)
Compiler Optimizations
Low-level Optimizations register allocation instruction scheduling constant propagation ...
High-level Optimizations loop transformations coarse grained parallelism ...
4
Our focus
![Page 5: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bfaf1a28abf838c9d4db/html5/thumbnails/5.jpg)
High-Level Optimizations
Goals: Parallelism and Data Locality
Why Parallelism?
Why Data Locality?
Why High-Level?
5
![Page 6: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bfaf1a28abf838c9d4db/html5/thumbnails/6.jpg)
Why Loop Transformations?
The 90/10 Rule
Loop Nests hotspot of almost all programs few lines of change => huge impact natural source of parallelism
6
“90% of the execution time is spent in less than
10% of the source code”
![Page 7: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bfaf1a28abf838c9d4db/html5/thumbnails/7.jpg)
Why Loop Transformations?
Which is faster?
7
for (i=0; i<N; i++) for (j=0; j<N; j++) for (k=0; k<N; k++) C[i][j] += A[i][k] * B[k][j];
for (i=0; i<N; i++) for (k=0; k<N; k++) for (j=0; j<N; j++) C[i][j] += A[i][k] * B[k][j];
![Page 8: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bfaf1a28abf838c9d4db/html5/thumbnails/8.jpg)
Why is it Faster?
Hardware Prefetching
8
for (i=0; i<N; i++) for (j=0; j<N; j++) for (k=0; k<N; k++) C[i][j] += A[i][k] * B[k][j];
for (i=0; i<N; i++) for (k=0; k<N; k++) for (j=0; j<N; j++) C[i][j] += A[i][k] * B[k][j];
unchanged next col next row
unchangednext col next col
![Page 9: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bfaf1a28abf838c9d4db/html5/thumbnails/9.jpg)
How to Automate?
The most challenging part! The same optimization doesn’t work for:
Why?
9
for (i=0; i<N; i++) for (j=0; j<N; j++) for (k=0; k<N; k++) { C1[i][j] += A1[i][k] * B1[k][j]; C2[i][j] += A2[i][k] * B2[k][j]; C3[i][j] += A3[i][k] * B3[k][j]; C4[i][j] += A4[i][k] * B4[k][j];}
![Page 10: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bfaf1a28abf838c9d4db/html5/thumbnails/10.jpg)
It’s Not Just Transformations
Many many reasoning steps: What to apply? How to apply? When to apply? What is its impact?
Quality of the analysis: How long does it take? Can it potentially degrade performance? Provable properties (completeness, etc.)
10
Compiler Research is all about coming up with techniques/abstractions/representations to allowthe compiler to perform deep analysis.
![Page 11: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bfaf1a28abf838c9d4db/html5/thumbnails/11.jpg)
Today’s Agenda
The Big Picture programming language compilers
Basic Concepts iteration space and loop nests polyhedral domains and functions parametric integer programming
Short history of polyhedral model
11
![Page 12: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bfaf1a28abf838c9d4db/html5/thumbnails/12.jpg)
Compiler Advances
Old compiler vs recent compiler modern architecture different versions of gcc
How much speedup by compiler alone after 20 years of research?
12
![Page 13: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bfaf1a28abf838c9d4db/html5/thumbnails/13.jpg)
Compiler Advances
Old compiler vs recent compiler modern architecture different versions of gcc 2x difference after 20 years (anecdotal)
Not so much?
13
![Page 14: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bfaf1a28abf838c9d4db/html5/thumbnails/14.jpg)
Compiler Advances
Old compiler vs recent compiler modern architecture different versions of gcc 2x difference after 20 years (anecdotal)
Not so much?
14
“The most remarkable accomplishment by far of the compiler field is the widespread use of high-level languages.”
by Mary Hall, David Padua, and Keshav Pingali[Compiler Research: The Next 50 Years, 2009]
![Page 15: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bfaf1a28abf838c9d4db/html5/thumbnails/15.jpg)
Placement of Compiler Research Part of Programming Languages
15
compiler
runtime systems program
verification
type theory
program synthesis
program analysis
program trans.
![Page 16: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bfaf1a28abf838c9d4db/html5/thumbnails/16.jpg)
Earlier Accomplishments
Getting efficient assembly register allocation instruction scheduling ...
High-level language features object-orientation dynamic types automated memory management ...
16
![Page 17: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bfaf1a28abf838c9d4db/html5/thumbnails/17.jpg)
New twists
New machines SIMD, IBM Cell, GPGPU, Xeon-phi
New language features even Java has lambda functions now parallelism oriented features
New types of Apps smartphones, tablets
New goals energy and security
17
![Page 18: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bfaf1a28abf838c9d4db/html5/thumbnails/18.jpg)
Recent research topics
Parallelism multi-cores, GPUs, ... language features for parallelism
Security/Reliability verification certified compilers
Power/Energy data movement voltage scaling
18
![Page 19: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bfaf1a28abf838c9d4db/html5/thumbnails/19.jpg)
Goals of the Compiler
Higher abstraction No more writing assemblies! enables language features
loops, functions, classes, aspects, ...
Performance while increasing productivity speed, space, energy, ... compiler optimizations
19
Personal View:Compiler is there to allow lazy
programming
![Page 20: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bfaf1a28abf838c9d4db/html5/thumbnails/20.jpg)
Job Market
Where do they work at? IBM Mathworks amazon start-ups Apple
Many opportunities in France Mathworks @ Grenoble Many start-ups
20
![Page 21: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bfaf1a28abf838c9d4db/html5/thumbnails/21.jpg)
Today’s Agenda
The Big Picture programming language compilers
Basic Concepts iteration space and loop nests polyhedral domains and functions parametric integer programming
Short history of polyhedral model
21
![Page 22: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bfaf1a28abf838c9d4db/html5/thumbnails/22.jpg)
Program IR
Abstract Syntax Tree basic representation within compilers
how to inspectthe AST to determineif a loop is parallel?
22
for (i in 1..N) A[i] = B[i] + 1;
NodeForiterator=i, LB=1,
UB=N
NodeAssignment
A[i]
B[i]
1
NodeBinOpop=+Not really suitable
for high-level analysis
![Page 23: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bfaf1a28abf838c9d4db/html5/thumbnails/23.jpg)
Extended Graphs
Completely unroll the loops
23
for (i=0; i<5; i++) for (j=1; j<4; j++) { A[i][j] = A[i][j-1] + B[i][j]; }
A[0][1] = A[0][0] + B[0][1];A[0][2] = A[0][1] + B[0][2];A[0][3] = A[0][2] + B[0][3];A[1][1] = A[1][0] + B[1][1];A[1][2] = A[1][1] + B[1][2];A[1][3] = A[1][2] + B[1][3];
....
![Page 24: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bfaf1a28abf838c9d4db/html5/thumbnails/24.jpg)
Extended Graphs
Completely unroll the loops
The difficulty: program parameters its “easy” with DAG representation scalability issues what if parameters are not known?
24
for (i=0; i<N; i++) for (j=1; j<M; j++) { A[i][j] = A[i][j-1] + B[i][j]; }
![Page 25: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bfaf1a28abf838c9d4db/html5/thumbnails/25.jpg)
Iteration Spaces
Need an abstraction for statement instances
25
for (i=0; i<N; i++) for (j=1; j<M; j++) { A[i][j] = A[i][j-1] + B[i][j]; }
i
j instance = integer
vector [i,j]
space = integer set 0≤i<N and 1≤j<M
![Page 26: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bfaf1a28abf838c9d4db/html5/thumbnails/26.jpg)
Lexicographic Order
Dictionary order applied to loop nests a aaa aab aba aaaa b
Compare instances (i,j) is before(i’,j’)i<i’ or i=i’ and j<j’
26
i
j
for (i=1; i<N; i++) for (j=1; j<M; j++) S0;
![Page 27: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bfaf1a28abf838c9d4db/html5/thumbnails/27.jpg)
What is the Polyhedral Model? It Depends (on who you ask)
If you ask me... Compiler Intermediate Representation
(IR) linear algebra based compact representation takes advantage of regularities
27
![Page 28: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bfaf1a28abf838c9d4db/html5/thumbnails/28.jpg)
Polyhedral Representation
High-level abstraction of the program Iteration space: integer polyhedron Dependences: affine functions
Usual optimization flow 1. extract polyhedral representation 2. reason/transform the model 3. generate code in the end
28
![Page 29: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bfaf1a28abf838c9d4db/html5/thumbnails/29.jpg)
Polyhedral Domains
Statements instances as integer polyhedra
Example: N2/2 instances of S0 Denoted as S0<i,j>
Represented as polyhedron {i,j|1≤i<N, 1≤j≤i} Geometric view
29
for (i=1; i<N; i++) for (j=1; j<=i; j++) S0;
i
j
i<N
j≤i
1≤j
1≤i
![Page 30: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bfaf1a28abf838c9d4db/html5/thumbnails/30.jpg)
Examples (Domains)
What are the domain of these statements?
30
for (i=0; i<=N; i++) { for (j=0; 0<=M; j++) { S1; } S2;}
for (i=0; i<=N; i++) { for (j=M; j>=0; j--) { S1; }}
for (i=0; i<=N; i++) { for (j=0; j<=M; j+=2) { S1; }}
for (i=0; i<=N; i++) { for (j=0; j<=M; j++) { if (j>i) S1; }}
![Page 31: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bfaf1a28abf838c9d4db/html5/thumbnails/31.jpg)
Z-Polyhedron
Polyhedron with holes intersection with lattices image of domain by affine function
Just a polyhedron in higher dimensional space
31
0<=i<=N and i%2=0
0<=i<=N and i=2j
i
j
2
1
![Page 32: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bfaf1a28abf838c9d4db/html5/thumbnails/32.jpg)
Dependence Functions
Affine functions over statement instances
Dataflow (i,j→i,j+1)
Dependence (i,j→i,j-1)
32
for (i=1; i<N; i++) for (j=1; j<M; j++)S0: A[i][j] = A[i][j-1];
i
j
![Page 33: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bfaf1a28abf838c9d4db/html5/thumbnails/33.jpg)
Dependence Functions
Dependences can be domain qualified
Dataflow if j=M-1
(i,j→i+1,1) else
(i,j→i,j+1)
33
for (i=1; i<N; i++) for (j=1; j<M; j++)S0: v++;
i
j
![Page 34: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bfaf1a28abf838c9d4db/html5/thumbnails/34.jpg)
Composing Transformations
Key strength of the framework
35
for i for j ...
for j for i ...
for j for i’ ... for i’’ ...
T1 T2
poly poly’
loop world
abstraction
![Page 35: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bfaf1a28abf838c9d4db/html5/thumbnails/35.jpg)
Parametric Analysis
Real-world code is filled with parameters code for NxM matrix, not 100x200
If the code is not parametric, and compilation time is not a big deal, it is an “easy” problem
Dealing with (potentially) infinitely different executions of a program
36
![Page 36: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bfaf1a28abf838c9d4db/html5/thumbnails/36.jpg)
What is the last iteration?
Key analysis
What is the instance that last wrote to A[k]?
Can be formulated as an ILP 0<i<N, 0<j<=i, i+j=k find lexicographically maximum k many analysis questions become ILP
for regular programs
37
for (i=1; i<N; i++) for (j=1; j<=i; j++)S0: A[i+j] = ...;
![Page 37: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bfaf1a28abf838c9d4db/html5/thumbnails/37.jpg)
Parametric Integer Programming Constraints
j≤10, i+j≤10 j-i≤N i,j≥0, N>0
Objective maximize j
Parametric Solution (0,N) if N≥10 (N,N) if N<10
38
maxim
ize
j≤10
j-i≤N
i+j≤10
![Page 38: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bfaf1a28abf838c9d4db/html5/thumbnails/38.jpg)
Parametric Integer Programming Constraints
j≤10, i+j≤10 j-i≤N i,j≥0, N>0
Objective maximize j
Parametric Solution (0,N) if N≥10 N-j+i≥0 (N,N) if N<10 N-j+i<0
39
maxim
ize
j≤10
j-i≤N
i+j≤10
2. Create branches for each case
1. Look at the sign of constraints
![Page 39: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bfaf1a28abf838c9d4db/html5/thumbnails/39.jpg)
Today’s Agenda
The Big Picture programming language compilers
Basic Concepts iteration space and loop nests polyhedral domains and functions parametric integer programming
Short history of polyhedral model
40
![Page 40: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bfaf1a28abf838c9d4db/html5/thumbnails/40.jpg)
History of the Polyhedral Model Also layout for Part I of the class
Keep in mind history is not objective
41
![Page 41: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bfaf1a28abf838c9d4db/html5/thumbnails/41.jpg)
Origins of the Polyhedral Model Two Starting Points
Loop program analysis Systems of recurrence equations
Loop-view is this loop parallel? what are the dependences?
Equational-view is this system of equations executable? how to find legal schedules?
42
![Page 42: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bfaf1a28abf838c9d4db/html5/thumbnails/42.jpg)
Polyhedral Timeline
43
recurrence equationssystolic arrays
loop dependence analysisloop transformation
1970 1990 2000
Array Dataflow Analysis 1991
Parametric Integer Programming 1988
Scheduling
Code Generation
Memory Allocation
multi-core
GPGPU
Distributed Memory
![Page 43: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bfaf1a28abf838c9d4db/html5/thumbnails/43.jpg)
Polyhedral Model: Short Story
44
Pluto(2008)
Cloog(2003)
Polylib, PIP(early 90s)
Multi-core
GPU
MPSoc
FPGA
VLSI
Automatic parallelization for shared and distributed
memory machines
Multi-dimensional Process Networks for System Level Design
Loop transformationsfor HLS
Multi-core era
Memory optimization for embedded multimedia
From a (very) subjective point of view … (originally by Steven Derrien)
Massively parallel Processor Arrays
![Page 44: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bfaf1a28abf838c9d4db/html5/thumbnails/44.jpg)
Polyhedral Equational Model
Idea: Map computations to code/hardware computations specified as equations
Example: Matrix Multiply
45
for i in 0 .. P for j in 0 .. Q for k in 0 .. R C[i][j] += A[i][k] * B[k][j];
C[i,j,k] = A[i,k] * B[k,j] : if k=0 = A[i,k] * B[k,j] + C[i,j,k-1] : if k>0
C[i,j] = Σk(A[i,k]*B[k,j]);
![Page 45: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bfaf1a28abf838c9d4db/html5/thumbnails/45.jpg)
The Connection
Array Dataflow Analysis [Feautrier 1991]
convert loops to equations limited to affine loops
domain: {[i,j,k]:0≤i≤P 0≤j≤Q ∧ ∧0≤k≤R}
dependences: S0<i,j,k> → S0<i,j,k-1> dataflow: (i,j,k→i,j,k+1)
46
for i in 0 .. P for j in 0 .. Q for k in 0 .. RS0: C[i][j] += A[i][k] * B[k][j];
![Page 46: CR18: Advanced Compilers L01 Introduction Tomofumi Yuki](https://reader035.vdocuments.mx/reader035/viewer/2022062309/5697bfaf1a28abf838c9d4db/html5/thumbnails/46.jpg)
Next Time
Dependence Analysis Array Dataflow Analysis Legality of transformations
47